operations

This module contains optimized deep learning related operations used in the Ultralytics YOLO framework

Non-max suppression

Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box.

Parameters:

Name	Type	Description	Default
`prediction`	`torch.Tensor`	A tensor of shape (batch_size, num_boxes, num_classes + 4 + num_masks) containing the predicted boxes, classes, and masks. The tensor should be in the format output by a model, such as YOLO.	required
`conf_thres`	`float`	The confidence threshold below which boxes will be filtered out. Valid values are between 0.0 and 1.0.	`0.25`
`iou_thres`	`float`	The IoU threshold below which boxes will be filtered out during NMS. Valid values are between 0.0 and 1.0.	`0.45`
`classes`	`List[int]`	A list of class indices to consider. If None, all classes will be considered.	`None`
`agnostic`	`bool`	If True, the model is agnostic to the number of classes, and all classes will be considered as one.	`False`
`multi_label`	`bool`	If True, each box may have multiple labels.	`False`
`labels`	`List[List[Union[int, float, torch.Tensor]]]`	A list of lists, where each inner list contains the apriori labels for a given image. The list should be in the format output by a dataloader, with each label being a tuple of (class_index, x1, y1, x2, y2).	`()`
`max_det`	`int`	The maximum number of boxes to keep after NMS.	`300`
`nm`	`int`	The number of masks output by the model.	`0`

Returns:

Type	Description
	List[torch.Tensor]: A list of length batch_size, where each element is a tensor of shape (num_boxes, 6 + num_masks) containing the kept boxes, with columns (x1, y1, x2, y2, confidence, class, mask1, mask2, ...).

Scale boxes

Rescale boxes (xyxy) from img1_shape to img0_shape

Parameters:

Name	Description	Default
`img1_shape`	The shape of the image that the bounding boxes are for.	required
`boxes`	the bounding boxes of the objects in the image	required
`img0_shape`	the shape of the original image	required
`ratio_pad`	a tuple of (ratio, pad)	`None`

Returns:

Type	Description
	The boxes are being returned.

Scale image

It takes a mask, and resizes it to the original image size

Parameters:

Name	Description	Default
`im1_shape`	model input shape, [h, w]	required
`masks`	[h, w, num]	required
`im0_shape`	the original image shape	required
`ratio_pad`	the ratio of the padding to the original image.	`None`

Returns:

Type	Description
	The masks are being returned.

clip boxes

It takes a list of bounding boxes and a shape (height, width) and clips the bounding boxes to the shape

Parameters:

Name	Type	Description	Default
`boxes`		the bounding boxes to clip	required
`shape`		the shape of the image	required

Box Format Conversion

xyxy2xywh

It takes a list of bounding boxes, and converts them from the format [x1, y1, x2, y2] to [x, y, w, h] where xy1=top-left, xy2=bottom-right

Parameters:

Name	Type	Description	Default
`x`		the input tensor	required

Returns:

Type	Description
	the center of the box, the width and the height of the box.

xywh2xyxy

It converts the bounding box from x,y,w,h to x1,y1,x2,y2 where xy1=top-left, xy2=bottom-right

Parameters:

Name	Type	Description	Default
`x`		the input tensor	required

Returns:

Type	Description
	the top left and bottom right coordinates of the bounding box.

xywhn2xyxy

It converts the normalized coordinates to the actual coordinates [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right

Parameters:

Name	Description	Default
`x`	the bounding box coordinates	required
`w`	width of the image. Defaults to 640	`640`
`h`	height of the image. Defaults to 640	`640`
`padw`	padding width. Defaults to 0	`0`
`padh`	height of the padding. Defaults to 0	`0`

Returns:

Type	Description
	the xyxy coordinates of the bounding box.

xyxy2xywhn

It takes in a list of bounding boxes, and returns a list of bounding boxes, but with the x and y coordinates normalized to the width and height of the image

Parameters:

Name	Description	Default
`x`	the bounding box coordinates	required
`w`	width of the image. Defaults to 640	`640`
`h`	height of the image. Defaults to 640	`640`
`clip`	If True, the boxes will be clipped to the image boundaries. Defaults to False	`False`
`eps`	the minimum value of the box's width and height.	`0.0`

Returns:

Type	Description
	the xywhn format of the bounding boxes.

xyn2xy

It converts normalized segments into pixel segments of shape (n,2)

Parameters:

Name	Description	Default
`x`	the normalized coordinates of the bounding box	required
`w`	width of the image. Defaults to 640	`640`
`h`	height of the image. Defaults to 640	`640`
`padw`	padding width. Defaults to 0	`0`
`padh`	padding height. Defaults to 0	`0`

Returns:

Type	Description
	the x and y coordinates of the top left corner of the bounding box.

xywh2ltwh

It converts the bounding box from [x, y, w, h] to [x1, y1, w, h] where xy1=top-left

Parameters:

Name	Type	Description	Default
`x`		the x coordinate of the center of the bounding box	required

Returns:

Type	Description
	the top left x and y coordinates of the bounding box.

xyxy2ltwh

Convert nx4 boxes from [x1, y1, x2, y2] to [x1, y1, w, h] where xy1=top-left, xy2=bottom-right

Parameters:

Name	Type	Description	Default
`x`		the input tensor	required

Returns:

Type	Description
	the xyxy2ltwh function.

ltwh2xywh

Convert nx4 boxes from [x1, y1, w, h] to [x, y, w, h] where xy1=top-left, xy=center

Parameters:

Name	Type	Description	Default
`x`		the input tensor	required

ltwh2xyxy

It converts the bounding box from [x1, y1, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right

Parameters:

Name	Type	Description	Default
`x`		the input image	required

Returns:

Type	Description
	the xyxy coordinates of the bounding boxes.

segment2box

Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to (xyxy)

Parameters:

Name	Description	Default
`segment`	the segment label	required
`width`	the width of the image. Defaults to 640	`640`
`height`	The height of the image. Defaults to 640	`640`

Returns:

Type	Description
	the minimum and maximum x and y values of the segment.

Mask Operations

resample_segments

It takes a list of segments (n,2) and returns a list of segments (n,2) where each segment has been up-sampled to n points

Parameters:

Name	Type	Description	Default
`segments`		a list of (n,2) arrays, where n is the number of points in the segment.	required
`n`		number of points to resample the segment to. Defaults to 1000	`1000`

Returns:

Type	Description
	the resampled segments.

crop_mask

It takes a mask and a bounding box, and returns a mask that is cropped to the bounding box

Parameters:

Name	Type	Description	Default
`masks`		[h, w, n] tensor of masks	required
`boxes`		[n, 4] tensor of bbox coords in relative point form	required

Returns:

Type	Description
	The masks are being cropped to the bounding box.

process_mask_upsample

It takes the output of the mask head, and applies the mask to the bounding boxes. This produces masks of higher quality but is slower.

Parameters:

Name	Description	Default
`protos`	[mask_dim, mask_h, mask_w]	required
`masks_in`	[n, mask_dim], n is number of masks after nms	required
`bboxes`	[n, 4], n is number of masks after nms	required
`shape`	the size of the input image	required

Returns:

Type	Description
	mask

process_mask

It takes the output of the mask head, and applies the mask to the bounding boxes. This is faster but produces downsampled quality of mask

Parameters:

Name	Description	Default
`protos`	[mask_dim, mask_h, mask_w]	required
`masks_in`	[n, mask_dim], n is number of masks after nms	required
`bboxes`	[n, 4], n is number of masks after nms	required
`shape`	the size of the input image	required

Returns:

Type	Description
	mask

process_mask_native

It takes the output of the mask head, and crops it after upsampling to the bounding boxes.

Parameters:

Name	Description	Default
`protos`	[mask_dim, mask_h, mask_w]	required
`masks_in`	[n, mask_dim], n is number of masks after nms	required
`bboxes`	[n, 4], n is number of masks after nms	required
`shape`	input_image_size, (h, w)	required

Returns:

Name	Type	Description
`masks`		[h, w, n]

scale_segments

Rescale segment coords (xyxy) from img1_shape to img0_shape

Parameters:

Name	Description	Default
`img1_shape`	The shape of the image that the segments are from.	required
`segments`	the segments to be scaled	required
`img0_shape`	the shape of the image that the segmentation is being applied to	required
`ratio_pad`	the ratio of the image size to the padded image size.	`None`
`normalize`	If True, the coordinates will be normalized to the range [0, 1]. Defaults to False	`False`

Returns:

Type	Description
	the segmented image.

masks2segments

It takes a list of masks(n,h,w) and returns a list of segments(n,xy)

Parameters:

Name	Type	Description	Default
`masks`		the output of the model, which is a tensor of shape (batch_size, 160, 160)	required
`strategy`		'concat' or 'largest'. Defaults to largest	`'largest'`

Returns:

Name	Type	Description
`segments`	`List`	list of segment masks

clip_segments

It takes a list of line segments (x1,y1,x2,y2) and clips them to the image shape (height, width)

Parameters:

Name	Type	Description	Default
`segments`		a list of segments, each segment is a list of points, each point is a list of x,y	required

coordinates shape: the shape of the image