operations
This module contains optimized deep learning related operations used in the Ultralytics YOLO framework
Non-max suppression
Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prediction |
torch.Tensor
|
A tensor of shape (batch_size, num_boxes, num_classes + 4 + num_masks) containing the predicted boxes, classes, and masks. The tensor should be in the format output by a model, such as YOLO. |
required |
conf_thres |
float
|
The confidence threshold below which boxes will be filtered out. Valid values are between 0.0 and 1.0. |
0.25
|
iou_thres |
float
|
The IoU threshold below which boxes will be filtered out during NMS. Valid values are between 0.0 and 1.0. |
0.45
|
classes |
List[int]
|
A list of class indices to consider. If None, all classes will be considered. |
None
|
agnostic |
bool
|
If True, the model is agnostic to the number of classes, and all classes will be considered as one. |
False
|
multi_label |
bool
|
If True, each box may have multiple labels. |
False
|
labels |
List[List[Union[int, float, torch.Tensor]]]
|
A list of lists, where each inner list contains the apriori labels for a given image. The list should be in the format output by a dataloader, with each label being a tuple of (class_index, x1, y1, x2, y2). |
()
|
max_det |
int
|
The maximum number of boxes to keep after NMS. |
300
|
nm |
int
|
The number of masks output by the model. |
0
|
Returns:
Type | Description |
---|---|
List[torch.Tensor]: A list of length batch_size, where each element is a tensor of shape (num_boxes, 6 + num_masks) containing the kept boxes, with columns (x1, y1, x2, y2, confidence, class, mask1, mask2, ...). |
Scale boxes
Rescale boxes (xyxy) from img1_shape to img0_shape
Parameters:
Name | Type | Description | Default |
---|---|---|---|
img1_shape |
The shape of the image that the bounding boxes are for. |
required | |
boxes |
the bounding boxes of the objects in the image |
required | |
img0_shape |
the shape of the original image |
required | |
ratio_pad |
a tuple of (ratio, pad) |
None
|
Returns:
Type | Description |
---|---|
The boxes are being returned. |
Scale image
It takes a mask, and resizes it to the original image size
Parameters:
Name | Type | Description | Default |
---|---|---|---|
im1_shape |
model input shape, [h, w] |
required | |
masks |
[h, w, num] |
required | |
im0_shape |
the original image shape |
required | |
ratio_pad |
the ratio of the padding to the original image. |
None
|
Returns:
Type | Description |
---|---|
The masks are being returned. |
clip boxes
It takes a list of bounding boxes and a shape (height, width) and clips the bounding boxes to the shape
Parameters:
Name | Type | Description | Default |
---|---|---|---|
boxes |
the bounding boxes to clip |
required | |
shape |
the shape of the image |
required |
Box Format Conversion
xyxy2xywh
It takes a list of bounding boxes, and converts them from the format [x1, y1, x2, y2] to [x, y, w, h] where xy1=top-left, xy2=bottom-right
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
the input tensor |
required |
Returns:
Type | Description |
---|---|
the center of the box, the width and the height of the box. |
xywh2xyxy
It converts the bounding box from x,y,w,h to x1,y1,x2,y2 where xy1=top-left, xy2=bottom-right
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
the input tensor |
required |
Returns:
Type | Description |
---|---|
the top left and bottom right coordinates of the bounding box. |
xywhn2xyxy
It converts the normalized coordinates to the actual coordinates [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
the bounding box coordinates |
required | |
w |
width of the image. Defaults to 640 |
640
|
|
h |
height of the image. Defaults to 640 |
640
|
|
padw |
padding width. Defaults to 0 |
0
|
|
padh |
height of the padding. Defaults to 0 |
0
|
Returns:
Type | Description |
---|---|
the xyxy coordinates of the bounding box. |
xyxy2xywhn
It takes in a list of bounding boxes, and returns a list of bounding boxes, but with the x and y coordinates normalized to the width and height of the image
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
the bounding box coordinates |
required | |
w |
width of the image. Defaults to 640 |
640
|
|
h |
height of the image. Defaults to 640 |
640
|
|
clip |
If True, the boxes will be clipped to the image boundaries. Defaults to False |
False
|
|
eps |
the minimum value of the box's width and height. |
0.0
|
Returns:
Type | Description |
---|---|
the xywhn format of the bounding boxes. |
xyn2xy
It converts normalized segments into pixel segments of shape (n,2)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
the normalized coordinates of the bounding box |
required | |
w |
width of the image. Defaults to 640 |
640
|
|
h |
height of the image. Defaults to 640 |
640
|
|
padw |
padding width. Defaults to 0 |
0
|
|
padh |
padding height. Defaults to 0 |
0
|
Returns:
Type | Description |
---|---|
the x and y coordinates of the top left corner of the bounding box. |
xywh2ltwh
It converts the bounding box from [x, y, w, h] to [x1, y1, w, h] where xy1=top-left
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
the x coordinate of the center of the bounding box |
required |
Returns:
Type | Description |
---|---|
the top left x and y coordinates of the bounding box. |
xyxy2ltwh
Convert nx4 boxes from [x1, y1, x2, y2] to [x1, y1, w, h] where xy1=top-left, xy2=bottom-right
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
the input tensor |
required |
Returns:
Type | Description |
---|---|
the xyxy2ltwh function. |
ltwh2xywh
Convert nx4 boxes from [x1, y1, w, h] to [x, y, w, h] where xy1=top-left, xy=center
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
the input tensor |
required |
ltwh2xyxy
It converts the bounding box from [x1, y1, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
the input image |
required |
Returns:
Type | Description |
---|---|
the xyxy coordinates of the bounding boxes. |
segment2box
Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to (xyxy)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
segment |
the segment label |
required | |
width |
the width of the image. Defaults to 640 |
640
|
|
height |
The height of the image. Defaults to 640 |
640
|
Returns:
Type | Description |
---|---|
the minimum and maximum x and y values of the segment. |
Mask Operations
resample_segments
It takes a list of segments (n,2) and returns a list of segments (n,2) where each segment has been up-sampled to n points
Parameters:
Name | Type | Description | Default |
---|---|---|---|
segments |
a list of (n,2) arrays, where n is the number of points in the segment. |
required | |
n |
number of points to resample the segment to. Defaults to 1000 |
1000
|
Returns:
Type | Description |
---|---|
the resampled segments. |
crop_mask
It takes a mask and a bounding box, and returns a mask that is cropped to the bounding box
Parameters:
Name | Type | Description | Default |
---|---|---|---|
masks |
[h, w, n] tensor of masks |
required | |
boxes |
[n, 4] tensor of bbox coords in relative point form |
required |
Returns:
Type | Description |
---|---|
The masks are being cropped to the bounding box. |
process_mask_upsample
It takes the output of the mask head, and applies the mask to the bounding boxes. This produces masks of higher quality but is slower.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protos |
[mask_dim, mask_h, mask_w] |
required | |
masks_in |
[n, mask_dim], n is number of masks after nms |
required | |
bboxes |
[n, 4], n is number of masks after nms |
required | |
shape |
the size of the input image |
required |
Returns:
Type | Description |
---|---|
mask |
process_mask
It takes the output of the mask head, and applies the mask to the bounding boxes. This is faster but produces downsampled quality of mask
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protos |
[mask_dim, mask_h, mask_w] |
required | |
masks_in |
[n, mask_dim], n is number of masks after nms |
required | |
bboxes |
[n, 4], n is number of masks after nms |
required | |
shape |
the size of the input image |
required |
Returns:
Type | Description |
---|---|
mask |
process_mask_native
It takes the output of the mask head, and crops it after upsampling to the bounding boxes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
protos |
[mask_dim, mask_h, mask_w] |
required | |
masks_in |
[n, mask_dim], n is number of masks after nms |
required | |
bboxes |
[n, 4], n is number of masks after nms |
required | |
shape |
input_image_size, (h, w) |
required |
Returns:
Name | Type | Description |
---|---|---|
masks | [h, w, n] |
scale_segments
Rescale segment coords (xyxy) from img1_shape to img0_shape
Parameters:
Name | Type | Description | Default |
---|---|---|---|
img1_shape |
The shape of the image that the segments are from. |
required | |
segments |
the segments to be scaled |
required | |
img0_shape |
the shape of the image that the segmentation is being applied to |
required | |
ratio_pad |
the ratio of the image size to the padded image size. |
None
|
|
normalize |
If True, the coordinates will be normalized to the range [0, 1]. Defaults to False |
False
|
Returns:
Type | Description |
---|---|
the segmented image. |
masks2segments
It takes a list of masks(n,h,w) and returns a list of segments(n,xy)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
masks |
the output of the model, which is a tensor of shape (batch_size, 160, 160) |
required | |
strategy |
'concat' or 'largest'. Defaults to largest |
'largest'
|
Returns:
Name | Type | Description |
---|---|---|
segments |
List
|
list of segment masks |
clip_segments
It takes a list of line segments (x1,y1,x2,y2) and clips them to the image shape (height, width)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
segments |
a list of segments, each segment is a list of points, each point is a list of x,y |
required |
coordinates shape: the shape of the image