Skip to content

operations

This module contains optimized deep learning related operations used in the Ultralytics YOLO framework

Non-max suppression

Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box.

Parameters:

Name Type Description Default
prediction torch.Tensor

A tensor of shape (batch_size, num_boxes, num_classes + 4 + num_masks) containing the predicted boxes, classes, and masks. The tensor should be in the format output by a model, such as YOLO.

required
conf_thres float

The confidence threshold below which boxes will be filtered out. Valid values are between 0.0 and 1.0.

0.25
iou_thres float

The IoU threshold below which boxes will be filtered out during NMS. Valid values are between 0.0 and 1.0.

0.45
classes List[int]

A list of class indices to consider. If None, all classes will be considered.

None
agnostic bool

If True, the model is agnostic to the number of classes, and all classes will be considered as one.

False
multi_label bool

If True, each box may have multiple labels.

False
labels List[List[Union[int, float, torch.Tensor]]]

A list of lists, where each inner list contains the apriori labels for a given image. The list should be in the format output by a dataloader, with each label being a tuple of (class_index, x1, y1, x2, y2).

()
max_det int

The maximum number of boxes to keep after NMS.

300
nm int

The number of masks output by the model.

0

Returns:

Type Description

List[torch.Tensor]: A list of length batch_size, where each element is a tensor of shape (num_boxes, 6 + num_masks) containing the kept boxes, with columns (x1, y1, x2, y2, confidence, class, mask1, mask2, ...).


Scale boxes

Rescale boxes (xyxy) from img1_shape to img0_shape

Parameters:

Name Type Description Default
img1_shape

The shape of the image that the bounding boxes are for.

required
boxes

the bounding boxes of the objects in the image

required
img0_shape

the shape of the original image

required
ratio_pad

a tuple of (ratio, pad)

None

Returns:

Type Description

The boxes are being returned.


Scale image

It takes a mask, and resizes it to the original image size

Parameters:

Name Type Description Default
im1_shape

model input shape, [h, w]

required
masks

[h, w, num]

required
im0_shape

the original image shape

required
ratio_pad

the ratio of the padding to the original image.

None

Returns:

Type Description

The masks are being returned.


clip boxes

It takes a list of bounding boxes and a shape (height, width) and clips the bounding boxes to the shape

Parameters:

Name Type Description Default
boxes

the bounding boxes to clip

required
shape

the shape of the image

required

Box Format Conversion

xyxy2xywh

It takes a list of bounding boxes, and converts them from the format [x1, y1, x2, y2] to [x, y, w, h] where xy1=top-left, xy2=bottom-right

Parameters:

Name Type Description Default
x

the input tensor

required

Returns:

Type Description

the center of the box, the width and the height of the box.


xywh2xyxy

It converts the bounding box from x,y,w,h to x1,y1,x2,y2 where xy1=top-left, xy2=bottom-right

Parameters:

Name Type Description Default
x

the input tensor

required

Returns:

Type Description

the top left and bottom right coordinates of the bounding box.


xywhn2xyxy

It converts the normalized coordinates to the actual coordinates [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right

Parameters:

Name Type Description Default
x

the bounding box coordinates

required
w

width of the image. Defaults to 640

640
h

height of the image. Defaults to 640

640
padw

padding width. Defaults to 0

0
padh

height of the padding. Defaults to 0

0

Returns:

Type Description

the xyxy coordinates of the bounding box.


xyxy2xywhn

It takes in a list of bounding boxes, and returns a list of bounding boxes, but with the x and y coordinates normalized to the width and height of the image

Parameters:

Name Type Description Default
x

the bounding box coordinates

required
w

width of the image. Defaults to 640

640
h

height of the image. Defaults to 640

640
clip

If True, the boxes will be clipped to the image boundaries. Defaults to False

False
eps

the minimum value of the box's width and height.

0.0

Returns:

Type Description

the xywhn format of the bounding boxes.


xyn2xy

It converts normalized segments into pixel segments of shape (n,2)

Parameters:

Name Type Description Default
x

the normalized coordinates of the bounding box

required
w

width of the image. Defaults to 640

640
h

height of the image. Defaults to 640

640
padw

padding width. Defaults to 0

0
padh

padding height. Defaults to 0

0

Returns:

Type Description

the x and y coordinates of the top left corner of the bounding box.


xywh2ltwh

It converts the bounding box from [x, y, w, h] to [x1, y1, w, h] where xy1=top-left

Parameters:

Name Type Description Default
x

the x coordinate of the center of the bounding box

required

Returns:

Type Description

the top left x and y coordinates of the bounding box.


xyxy2ltwh

Convert nx4 boxes from [x1, y1, x2, y2] to [x1, y1, w, h] where xy1=top-left, xy2=bottom-right

Parameters:

Name Type Description Default
x

the input tensor

required

Returns:

Type Description

the xyxy2ltwh function.


ltwh2xywh

Convert nx4 boxes from [x1, y1, w, h] to [x, y, w, h] where xy1=top-left, xy=center

Parameters:

Name Type Description Default
x

the input tensor

required

ltwh2xyxy

It converts the bounding box from [x1, y1, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right

Parameters:

Name Type Description Default
x

the input image

required

Returns:

Type Description

the xyxy coordinates of the bounding boxes.


segment2box

Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to (xyxy)

Parameters:

Name Type Description Default
segment

the segment label

required
width

the width of the image. Defaults to 640

640
height

The height of the image. Defaults to 640

640

Returns:

Type Description

the minimum and maximum x and y values of the segment.


Mask Operations

resample_segments

It takes a list of segments (n,2) and returns a list of segments (n,2) where each segment has been up-sampled to n points

Parameters:

Name Type Description Default
segments

a list of (n,2) arrays, where n is the number of points in the segment.

required
n

number of points to resample the segment to. Defaults to 1000

1000

Returns:

Type Description

the resampled segments.


crop_mask

It takes a mask and a bounding box, and returns a mask that is cropped to the bounding box

Parameters:

Name Type Description Default
masks

[h, w, n] tensor of masks

required
boxes

[n, 4] tensor of bbox coords in relative point form

required

Returns:

Type Description

The masks are being cropped to the bounding box.


process_mask_upsample

It takes the output of the mask head, and applies the mask to the bounding boxes. This produces masks of higher quality but is slower.

Parameters:

Name Type Description Default
protos

[mask_dim, mask_h, mask_w]

required
masks_in

[n, mask_dim], n is number of masks after nms

required
bboxes

[n, 4], n is number of masks after nms

required
shape

the size of the input image

required

Returns:

Type Description

mask


process_mask

It takes the output of the mask head, and applies the mask to the bounding boxes. This is faster but produces downsampled quality of mask

Parameters:

Name Type Description Default
protos

[mask_dim, mask_h, mask_w]

required
masks_in

[n, mask_dim], n is number of masks after nms

required
bboxes

[n, 4], n is number of masks after nms

required
shape

the size of the input image

required

Returns:

Type Description

mask


process_mask_native

It takes the output of the mask head, and crops it after upsampling to the bounding boxes.

Parameters:

Name Type Description Default
protos

[mask_dim, mask_h, mask_w]

required
masks_in

[n, mask_dim], n is number of masks after nms

required
bboxes

[n, 4], n is number of masks after nms

required
shape

input_image_size, (h, w)

required

Returns:

Name Type Description
masks

[h, w, n]


scale_segments

Rescale segment coords (xyxy) from img1_shape to img0_shape

Parameters:

Name Type Description Default
img1_shape

The shape of the image that the segments are from.

required
segments

the segments to be scaled

required
img0_shape

the shape of the image that the segmentation is being applied to

required
ratio_pad

the ratio of the image size to the padded image size.

None
normalize

If True, the coordinates will be normalized to the range [0, 1]. Defaults to False

False

Returns:

Type Description

the segmented image.


masks2segments

It takes a list of masks(n,h,w) and returns a list of segments(n,xy)

Parameters:

Name Type Description Default
masks

the output of the model, which is a tensor of shape (batch_size, 160, 160)

required
strategy

'concat' or 'largest'. Defaults to largest

'largest'

Returns:

Name Type Description
segments List

list of segment masks


clip_segments

It takes a list of line segments (x1,y1,x2,y2) and clips them to the image shape (height, width)

Parameters:

Name Type Description Default
segments

a list of segments, each segment is a list of points, each point is a list of x,y

required

coordinates shape: the shape of the image