分类 OpenCV_图像_视觉_算法 下的文章

Image segmentation is a core vision problem that can provide a solution for a large number of use cases. Starting from medical imaging to analyzing traffic, it has immense potential. Instance segmentation, i.e., object detection + segmentation, is even more powerful as it allows us to detect and segment objects in a single pipeline. For this purpose, the Ultralytics YOLOv8 models offer a simple pipeline. In this article, we will carry out YOLOv8 instance segmentation training on custom data.

While going through the training process of YOLOv8 instance segmentation models, we will cover:

  • Training of three different models, namely, YOLOv8 Nano, YOLOv8 Small, and YOLOv8 Medium model.
  • Analyze the results for each of the models.
  • Carry out inferences using the trained models.

This will allow us to explore each aspect of the training pipeline. Furthermore, it will also equip us with adequate knowledge to use YOLOv8 instance segmentation models in our own projects.

YOLO Master Post –  Every Model Explained

Unlock the full story behind all the YOLO models’ evolutionary journey: Dive into our extensive pillar post, where we unravel the evolution from YOLOv1 to YOLO-NAS. This essential guide is packed with insights, comparisons, and a deeper understanding that you won’t find anywhere else.
Don’t miss out on this comprehensive resource, Mastering All Yolo Models for a richer, more informed perspective on the YOLO series.

The Underwater Trash Instance Segmentation Dataset

We will use the TrashCan 1.0 An Instance-Segmentation dataset to train the YOLOv8 models. This dataset consists of underwater imagery to detect and segment trash in and around the ocean floor. 

There are two versions of the instance segmentation dataset: an instance version and a material version. For our purpose, we will use the material version as it is easier to solve with fewer classes.

Originally, the annotations were in JSON format. We have already converted the dataset into a YOLO text file format that you can directly download. 

The dataset contains 6008 training instances and 1204 validation instances. There are a total of 16 classes in the dataset. The following are the classes, along with their label indices.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
        0: 'rov',
        1: 'plant',
        2: 'animal_fish',
        3: 'animal_starfish',
        4: 'animal_shells',
        5: 'animal_crab',
        6: 'animal_eel',
        7: 'animal_etc',
        8: 'trash_etc',
        9: 'trash_fabric',
        10: 'trash_fishing_gear',
        11: 'trash_metal',
        12: 'trash_paper',
        13: 'trash_plastic',
        14: 'trash_rubber',
        15: 'trash_wood',
}

Here are a few examples from the dataset to get a better understanding of the type of images we are dealing with.

Images from the underwater trash instance segmentation dataset.
Figure 1. Images from the underwater trash instance segmentation dataset.

As we can see, the dataset seems challenging. Most of the objects are small, and a lot of the trash material looks similar. However, solving such a detection and segmentation problem will allow unmanned underwater robots to pick up trash automatically.

If you intend on training the models locally, you can download the dataset through this link.

In case you would like to use cloud GPU providers or Colab, you can use the Jupyter Notebook that comes with this post via the above download link.

But first, let’s go into the technical parts of this article.

The YOLOv8 Instance Segmentation Label Format

We know that YOLO models need labels in text file format. For detection, each new line in a text file indicates an object. Following is an example:

1
8 0.575 0.381474 0.5875 0.377771

In the above examples, the class index of the object is 8, and the rest of the numbers indicate x_center, y_center, width, and height of the bounding box in a normalized format.

But how do we represent an instance segmentation object?

Let’s see an example that will make it much easier to understand the format.

1
8 0.575 0.381474 0.5875 0.377771 0.599996 0.355556 0.602079 0.311111 0.595833 0.300007 0.566667 0.300007 0.564583 0.314822 0.554167 0.314822 0.55 0.325933 0.535417 0.329637 0.529171 0.340741 0.529171 0.351852 0.535417 0.359252 0.545833 0.359252 0.554167 0.374067 0.558333 0.370363 0.575 0.381474

In this case, the first five numbers still encode the class index and bounding box information. The rest of the numbers encode the boundary of the object that we are trying to segment. Starting from the 6th number, we have space-separated x-y coordinates for each point on the boundary of the object for the segmentation mask.

Primarily, from a visual point of view, the segmentation boundary around an object will look like the following.

Points defining the ground truth instance segmentation boundary in images.
Figure 2. Points defining the ground truth instance segmentation boundary in images.

If you need a primer of inference using YOLO instance segmentation models, then YOLOv5 for instance segmentation is a starting point.

Preparing the Dataset YAML File

Before we move ahead with the training, we first need to prepare the dataset YAML file. We name it trashcan_inst_material.yaml and here are its content.

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
names:
  0: rov
  1: plant
  2: animal_fish
  3: animal_starfish
  4: animal_shells
  5: animal_crab
  6: animal_eel
  7: animal_etc
  8: trash_etc
  9: trash_fabric
  10: trash_fishing_gear
  11: trash_metal
  12: trash_paper
  13: trash_plastic
  14: trash_rubber
  15: trash_wood
path: underwater_trash_instance
train: train/images
val: val/images

The YAML file contains four attributes:

  • names: The class names starting from index 0 to number of classes – 1.
  • path: The absolute path to the dataset directory.
  • train: The training folder path inside the dataset directory.
  • val: The validation folder path inside the dataset directory.

All four attributes are mandatory to start the training process correctly. Later, we will use the same YAML for training all three YOLOv8 instance segmentation models.

You will need to install the ultralytics API to train locally. This YOLOv8 tutorial contains the necessary steps to install it and also all inferences using several models. 

Note: All training experiments were run on a machine with an Intel Xeon processor, 16 GB P100 GPU, and 32 GB of RAM.

Training YOLOv8 Nano Instance Segmentation Model

We will begin with the training of the Nano model – the smallest model in the YOLOv8 instance segmentation family.

Before starting the training, ensure the YAML file is in the same directory where you open the terminal else you will get a path error.

To start the training, execute the following command in the terminal.

1
yolo task=segment mode=train model=yolov8n-seg.pt imgsz=640 data=trash_inst_material.yaml epochs=100 batch=16 name=yolov8n-seg exist_ok=True amp=False

We use the yolo CLI to train the model. We use the following command line arguments in the above command:

  • task: This argument indicates the task we want to perform using the model. As we are training an instance segmentation model, the task here is segment.
  • mode: We can choose from train, predict, and val for the mode. Here, the mode is training specific.
  • model: This directly accepts the pretrained weight file name. If not already present, the yolo CLI will download it for the first time.
  • imgsz: The number of pixels on the longer side. By default, all YOLO models accept images reshaped with an aspect ratio maintained.
  • data: The path to the dataset YAML file.
  • epochs: This is the number of epochs we want to train the model on the dataset.
  • name: We can provide a custom result directory name using this argument.
  • exist_ok: This tells the CLI to use the same result directory if present without creating a new one.
  • amp: AMP stands for Automatic Mixed Precision. We are turning it off as some GPUs may not support it.

After training the model for 100 epochs, we get the following result.

YOLOv8 Nano instance segmentation results.
Figure 3. YOLOv8 Nano instance segmentation results.

The graphs under (B) indicate the bounding box metrics and the ones under (M) indicate segmentation mask metrics.

The Nano model reaches box mAP of 42.6% and segmentation mAP of 34.5% on the last epoch. From the graphs, it looks like there is still room for improvement. But instead of training the Nano model for longer, let’s train a larger model.

Training YOLOv8 Small Instance Segmentation Model

To start the Small model training, we need to change the model and the resulting directory name.

1
yolo task=segment mode=train model=yolov8s-seg.pt imgsz=640 data=trash_inst_material.yaml epochs=100 batch=16 name=yolov8s-seg exist_ok=True amp=False
YOLOv8 Small instance segmentation results after training on the underwater trash detection dataset.
Figure 4. YOLOv8 Small instance segmentation results after training on the underwater trash detection dataset.

The Small model reaches slightly higher metrics within the same number of epochs. This time, the last epoch’s box mAP is 44.38%, and the segmentation mask mAP is 35.16%.

 This appears to be a significant enhancement compared to our prior training experiment. Additionally, employing an even larger model could yield even more impressive outcomes.

Training YOLOv8 Medium Instance Segmentation Model

For our final training experiment, we will train the YOLOv8 Medium model for instance, segmentation.

Like the previous one, we only need to change the model name and experiment name in the training command.

1
yolo task=segment mode=train model=yolov8m-seg.pt imgsz=640 data=trash_inst_material.yaml epochs=100 batch=16 name=yolov8m-seg exist_ok=True amp=False
YOLOv8 Medium instance segmentation results.
Figure 5. YOLOv8 Medium instance segmentation results.

With the YOLOv8 Medium model, we have the highest box mAP yet of 45%. Also, the segmentation mask mAP reaches 36.2%.

Clearly, this is the best model we have till now. For inference, we will use the weights of the YOLOv8 Medium mode.

Comparison Between the Trained Models

Before moving on to the inference section, let’s take a look at the box and segmentation mAP graphs of each of the trained models.

Bounding box mAP comparison after training the YOLOv8 instance segmentation models.
Figure 6. Bounding box mAP comparison after training the YOLOv8 instance segmentation models.
Segmentation mask mAP comparison after training the YOLOv8 instance segmentation models.
Figure 7. Segmentation mask mAP comparison after training the YOLOv8 instance segmentation models.

Inference on Validation Images

First, we will run inferences on the validation images and check the YOLOv8 Medium model’s performance.

Note: The inference experiments were run on a laptop with an i7 8th generation CPU, 6 GB GTX 1060 GPU, and 16 GB RAM.
The following command expects that the trained weights are in the runs directory created from the model training experiments.

1
yolo model=runs/segment/yolov8m-seg/weights/best.pt mode=predict source=trash_inst_material/val/images name=yolov8m_seg_infer_valimages exist_ok=True

We provide the path to the validation images directory, and the command will run inference on all images.

The following is a video where the inference image results have been combined into a single video. This provides an easier way to analyze the results.

Clip 1. Inference results on all the validation images (combined to make a video) after training the YOLOv8 Medium instance segmentation model.

The results are not perfect, but they are exceptional. The model has demonstrated accurate segmentation of the ROV in a majority of the frames, including challenging classes such as trash_wood.

Inference on Videos

For inference on videos, we have chosen a few videos that were part of the test set of the initial version of this dataset. These videos are complex and contain a lot of objects in a single frame.

There are two videos for inference that you can access while downloading the code for this article. We can execute the following command to start the experiments.

1
yolo model=runs/segment/yolov8m-seg/weights/best.pt mode=predict source=trash_segment_inference_data/manythings.mp4 name=runs_medium exist_ok=True

This time, the source file is the video file that we want to run inference on.

Here are the results.

Clip 2. YOLOv8 instance segmentation inference on an underwater trash detection video with a complex scene. The model is unable to predict objects confidently whenever the camera is moving at high speed.

It is clear this is a highly complex scene. Let’s break down all the places where the model is performing well and where it isn’t.

  • In the first few frames, there is a lot of flickering. This is mostly because of fast camera movement, and because of this, the segmentation and detection predictions suffer.
  • After a while, the predictions become much better, but when the ROV first appears, the model cannot detect it. This is because of the high number of objects already present in the bottom right corner.
  • In the final few frames, there is a crab present in the scene which the model cannot predict.

With the GTX 1060 GPU, we are getting over 30 FPS which is real-time performance. 

Let’s run a final experiment on a simpler video.

1
yolo model=runs/segment/yolov8m-seg/weights/best.pt mode=predict source=trash_segment_inference_data/several.mp4 name=runs_medium exist_ok=True show=True
Clip 3. Inference on a comparatively simpler underwater trash detection scene using the YOLOv8 Medium instance segmentation model.

Interestingly, the model detects the distant trash classes correctly but is unable to detect the fish correctly. 

Articles On YOLO That You Should Not Miss

Summary

In this article, we went through the process of training three different instance segmentation models using the Ultralytics library. We chose a fairly difficult real-world dataset that presents a considerable challenge to today’s object detection and segmentation models. Although the results were not perfect, we have a starting point.

The above results show how difficult instance segmentation problems can be when trying to solve a real-world problem. Throwing huge models at them for training is not a solution, as we need real-time performance most of the time. Share your thoughts in the comments on how to enhance this project and develop an even more advanced model.

References

March 21, 2023
10 min read

Object detection and instance segmentation are crucial tasks in computer vision, with numerous applications ranging from self-driving cars to medical image analysis. However, a significant challenge is a need for large labeled datasets to train accurate models. Labeling datasets manually can be a tedious and time-consuming task, often requiring significant effort and resources. 

To address this challenge, model-assisted labeling has emerged as a powerful technique that can save time and money by reducing the number of manual annotations required. In this blog post, we will explore how model-assisted labeling works and how it can accelerate the labeling process for both object detection and instance segmentation. As an example, we will demonstrate how you can use the trainYOLO platform to easily apply this method to your YOLOv8 object detection or instance segmentation training.

What is Model-Assisted Labeling?

Model-assisted labeling is a process that uses a (pre-)trained machine learning model to generate annotations for a dataset automatically. Specifically, the model is used to predict the labels of the objects in the dataset, and these predictions are then used as a starting point for the manual labeling process. The annotations generated by the model only need to be refined or corrected by human annotators, reducing the total number of annotations required as opposed to labeling from scratch.

For example, in object detection tasks, a pre-trained model can be used to generate bounding boxes around the objects in an image. These bounding boxes can then be used as a starting point for the manual labeling process, with human annotators refining and correcting the locations of the boxes as necessary. This approach can save significant time and effort compared to manual labeling from scratch. As an example, see the difference between manually labeling an image of pollen versus using a model-assisted method:

Left: manual labeling, right: model-assisted labeling. As demonstrated, model-assisted labeling definitely increases labeling speed.

One of the key benefits of model-assisted labeling is that it enables a feedback loop between training the model and labeling more images. As the model gets better at generating initial annotations, the amount of manual work required decreases, allowing more time to be spent on training and improving the model. This cycle can be repeated multiple times, with each iteration resulting in a more accurate model and fewer manual annotations required. By continually improving both the model and the dataset, the overall accuracy of the object detection system can be significantly improved. Therefore, it is important to keep iterating and fine-tuning the model and dataset to achieve the best results possible.

After each training iteration, the model's accuracy increases and the manual labeling time decreases.

How to use model-assisted labeling on trainYOLO

The trainYOLO platform streamlines the process of model-assisted labeling for object detection and instance segmentation algorithms like YOLOv8 or YOLOv5. After each training iteration, you can upload your model to the platform and generate predictions with a click of a button. No need to upload predictions yourself. 

As an example, let’s take a look at the steps to train a YOLOv8 pollen detector using the model-assisted labeling approach. For a more detailed guide about how to upload, label and train algorithms like YOLOv5 or YOLOv8 using trainYOLO, take a look at our other posts: YOLOv5 object detection, YOLOv8 object detection, YOLOv8 instance segmentation.

1. Label an initial batch manually

To kickstart the process of model-assisted labeling, we begin by manually labeling the first batch of images. In this stage, it is crucial to choose a diverse set of images to improve the trained model's generalization. Also, the number of images that need to be labeled manually in this first batch varies depending on the task. If the images have many objects, fewer are required, and vice versa. In any case, the initial trained model's performance will indicate if more manual labeling is necessary.

[video-to-gif output image]

2. Train an initial model

Once the first batch of images is labeled, it's time to train the initial model. With trainYOLO's preconfigured Colab notebooks, training a YOLOv5 or YOLOv8 object detection or instance segmentation model is straightforward (see here for a detailed guide on how to start training). All you need to do is fill in your API key and Project name to start the training. Once finished, the model and its metrics are uploaded to our platform, where it is automatically deployed for use in model-assisted labeling. In our case, we labeled 50 images (comprising 1221 pollen) and reach the following accuracy:

With a score of 96.4 mAP, we have a perfect starting point for our model-assisted labeling. Looking back, we could have started out with a smaller batch of images.

3. Model-assisted labeling

Now that we have an initial model trained, we can start to utilize it as a labeling assistant. As the model is automatically deployed on trainYOLO, it enables us to generate predictions with a click of a button (the magic paintbrush). Note that the labeling speedup is dependent on the performance of the initial model , but it is expected to improve significantly with additional training iterations.

[video-to-gif output image]

4. Iterate

As previously mentioned, one of the significant advantages of model-assisted labeling is the feedback loop it creates between training the model and labeling more images. Therefore, it's recommended to train a new model each time we label another set of images. This approach enhances the prediction’s accuracy and accelerates the labeling process. A win-win for all.

Conclusion

Model-assisted labeling is a powerful technique for accelerating the labeling process for object detection tasks. By using the model to jump-start the initial annotations, model-assisted labeling can save significant time and money, while also improving the accuracy of annotations. To make the most of model-assisted labeling, it's important to focus on challenging images and correct the annotations generated by the model. By following these tips, you can effectively use model-assisted labeling to label datasets faster and more accurately.

In conclusion, model-assisted labeling is an important technique that can help overcome the challenges of manual labeling in object detection tasks. By leveraging the predictions of trained models combined with human expertise, model-assisted labeling can accelerate the labeling process while improving the accuracy of annotations. As computer vision applications continue to grow in importance and turnover rates continue to rise, model-assisted labeling is likely to become an increasingly valuable tool.

本文旨在收集整理开源的数据标注工具,方便使用,目前包括Image、Video、Text和Audio 4个方面。文中大部分工具,本人还未亲自测试过,标签或是归档不妥之处也会长期改进!

Table of Contents

1. Image

1.1 bbox

  • labelImg
    • labelImg是基于python和Qt的跨平台目标检测标注工具,操作方便、快捷、实用,应用广泛。
  • bbox-label-tool
    • bbox-label-tool是基于python的目标检测标注工具,实现简单,使用方便,但仅支持单类标注。
  • LabelBoundingBox
    • LabelBoundingBox是bbox-label-tool的升级版,能适应多类标注。
  • Yolo_mark
    • Yolo_mark是针对Yolo v2目标检测的标注工具。
  • FastAnnotationTool
    • FastAnnotationTool是一款基于C++和opencv的强大的目标检测数据标注工具,支持数据和字母OCR标注,提供多种数据增强功能(尺寸剪切、翻转、旋转、缩放、椒盐噪声、高斯噪声、矩形合并、线提取等),支持带倾斜角度目标标注,实用性极强。
  • od-annotation
    • od-annotation采用python-flask框架开发,基于B/S方式交互,支持多人同时标注。
  • RectLabel
    • RectLabel即可画框(目标检测)又可画多边形(分割)
  • CVAT
    • CVAT高效的标注工具,图像分类,目标检测,语义分割,实例分割,支持本地部署
  • VoTT
    • VoTT 微软发布的Eeb方式部署标注工具,至此图像和视频;支持CNTK,Pascal Voc格式;支持导出TFRecord,CSV,VoTT格式
  • VIA-VGG Image Annotator
    • 图像检测 语义分割 实例分割等;Web方式,也可本地部署;在人脸数据标注上提供了方便操作,人脸数据标注首选
  • Pixel Annotation Tool
    • 语义分割 实例分割标注神器
  • point-cloud-annotation-tool
    • 3D点云数据标注神器;支持点云数据加载,保存与可视化;支持点云数据选择;支持3D Box框生成;支持KITTI-bin格式数据
  • boobs
    • Yolo bbox标注工具;支持输出YOLO/VOC/COCO格式

1.2 Mask

  • labelme
    • labelme是基于python和Qt的跨平台标注工具,支持图像分割标注,操作方便、快捷、实用,应用广泛。
  • pylabelme
    • pylabelme是基于python和Qt的跨平台标注工具,支持图像分割标注,操作方便、快捷、实用,应用广泛。
  • Labelbox
    • labelbox是一款多功能数据标注工具,支持图像分割、图像分类、文本分类标注,操作方便、快捷、实用,应用广泛。
  • ImageLabel
    • ImageLabel是基于Qt和Opencv的图像分割标注工具,支持手动绘制轮廓,可利用GrabCut进行半自动标注,方便使用。
  • ImageSegmentation
    • ImageSegmentation是基于python的图像分割标注工具,操作方便实用。
  • opensurfaces-segmentation-ui
    • opensurfaces-segmentation-ui是基于python的图像分割标注工具,操作方便实用。
  • labelImgPlus
    • labelImgPlus是labelImg的升级版,支持图像分割、图像分类、目标检测标注,操作方便,通用性极强,应用广泛。

2. Video

  • video_labeler
    • video_labeler是一款基于python的视频目标检测、目标跟踪标注工具,轻便实用。
  • vatic
    • vatic是一款基于python的视频目标检测、目标跟踪标注工具,轻便实用,应用广泛。
  • lane-detection-with-opencv
    • lane-detection-with-opencv是一款基于Opencv的视频车道检测标注工具,特殊场景标注工具,实用性强。
  • OpenLabel
    • Openlabel是一款基于Opencv的视频目标检测、目标跟踪标注工具,轻便实用,应用广泛。

3. Text

  • brat
    • brat是基于python的自然语言标注工具,设计灵活,实用,应用广泛。
  • MarqueeLabel
    • MarqueeLabel是基于Swift和C的自然语言标注工具,设计灵活,实用,应用广泛。

4. Audio

  • audio-annotator
    • audio-annotator是基于Javascript的音频标注工具,它可以实现无形、声谱图、声波进行可视化标注,通用性强,应用广泛。
  • youtube-chord-ocr
    • youtube-chord-ocr是基于python的音频标注工具,可以实现将youtube上带有和弦标签的音乐视频转化为带标签的音频文件,应用广泛。
  • MusicSegmentation
    • MusicSegmentation是一种基于matlab的音乐分割标记工具,它通过计算谐波和音色分割音乐并标记,应用广泛。