Train YOLOv8 Instance Segmentation on Custom Data

作者: xsh
时间: 2025-02-08
分类: OpenCV_图像_视觉_算法,AI_神经网络
评论

Image segmentation is a core vision problem that can provide a solution for a large number of use cases. Starting from medical imaging to analyzing traffic, it has immense potential. Instance segmentation, i.e., object detection + segmentation, is even more powerful as it allows us to detect and segment objects in a single pipeline. For this purpose, the Ultralytics YOLOv8 models offer a simple pipeline. In this article, we will carry out YOLOv8 instance segmentation training on custom data.

While going through the training process of YOLOv8 instance segmentation models, we will cover:

Training of three different models, namely, YOLOv8 Nano, YOLOv8 Small, and YOLOv8 Medium model.
Analyze the results for each of the models.
Carry out inferences using the trained models.

This will allow us to explore each aspect of the training pipeline. Furthermore, it will also equip us with adequate knowledge to use YOLOv8 instance segmentation models in our own projects.

YOLO Master Post – Every Model Explained

Unlock the full story behind all the YOLO models’ evolutionary journey: Dive into our extensive pillar post, where we unravel the evolution from YOLOv1 to YOLO-NAS. This essential guide is packed with insights, comparisons, and a deeper understanding that you won’t find anywhere else.
Don’t miss out on this comprehensive resource, Mastering All Yolo Models for a richer, more informed perspective on the YOLO series.

Mastering All YOLO Models from YOLOv1 to YOLO-NAS: Papers Explained (2024)

The Underwater Trash Instance Segmentation Dataset

We will use the TrashCan 1.0 An Instance-Segmentation dataset to train the YOLOv8 models. This dataset consists of underwater imagery to detect and segment trash in and around the ocean floor.

There are two versions of the instance segmentation dataset: an instance version and a material version. For our purpose, we will use the material version as it is easier to solve with fewer classes.

Originally, the annotations were in JSON format. We have already converted the dataset into a YOLO text file format that you can directly download.

The dataset contains 6008 training instances and 1204 validation instances. There are a total of 16 classes in the dataset. The following are the classes, along with their label indices.

{
        0: 'rov',
        1: 'plant',
        2: 'animal_fish',
        3: 'animal_starfish',
        4: 'animal_shells',
        5: 'animal_crab',
        6: 'animal_eel',
        7: 'animal_etc',
        8: 'trash_etc',
        9: 'trash_fabric',
        10: 'trash_fishing_gear',
        11: 'trash_metal',
        12: 'trash_paper',
        13: 'trash_plastic',
        14: 'trash_rubber',
        15: 'trash_wood',
}

Here are a few examples from the dataset to get a better understanding of the type of images we are dealing with.

Figure 1. Images from the underwater trash instance segmentation dataset.

As we can see, the dataset seems challenging. Most of the objects are small, and a lot of the trash material looks similar. However, solving such a detection and segmentation problem will allow unmanned underwater robots to pick up trash automatically.

If you intend on training the models locally, you can download the dataset through this link.

In case you would like to use cloud GPU providers or Colab, you can use the Jupyter Notebook that comes with this post via the above download link.

But first, let’s go into the technical parts of this article.

The YOLOv8 Instance Segmentation Label Format

We know that YOLO models need labels in text file format. For detection, each new line in a text file indicates an object. Following is an example:

8 0.575 0.381474 0.5875 0.377771

In the above examples, the class index of the object is 8, and the rest of the numbers indicate x_center, y_center, width, and height of the bounding box in a normalized format.

But how do we represent an instance segmentation object?

Let’s see an example that will make it much easier to understand the format.

8 0.575 0.381474 0.5875 0.377771 0.599996 0.355556 0.602079 0.311111 0.595833 0.300007 0.566667 0.300007 0.564583 0.314822 0.554167 0.314822 0.55 0.325933 0.535417 0.329637 0.529171 0.340741 0.529171 0.351852 0.535417 0.359252 0.545833 0.359252 0.554167 0.374067 0.558333 0.370363 0.575 0.381474

In this case, the first five numbers still encode the class index and bounding box information. The rest of the numbers encode the boundary of the object that we are trying to segment. Starting from the 6th number, we have space-separated x-y coordinates for each point on the boundary of the object for the segmentation mask.

Primarily, from a visual point of view, the segmentation boundary around an object will look like the following.

Figure 2. Points defining the ground truth instance segmentation boundary in images.

If you need a primer of inference using YOLO instance segmentation models, then YOLOv5 for instance segmentation is a starting point.

Preparing the Dataset YAML File

Before we move ahead with the training, we first need to prepare the dataset YAML file. We name it trashcan_inst_material.yaml and here are its content.

Download Code To easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Click here to download the source code to this post

names:
  0: rov
  1: plant
  2: animal_fish
  3: animal_starfish
  4: animal_shells
  5: animal_crab
  6: animal_eel
  7: animal_etc
  8: trash_etc
  9: trash_fabric
  10: trash_fishing_gear
  11: trash_metal
  12: trash_paper
  13: trash_plastic
  14: trash_rubber
  15: trash_wood
path: underwater_trash_instance
train: train/images
val: val/images

The YAML file contains four attributes:

names: The class names starting from index 0 to number of classes – 1.
path: The absolute path to the dataset directory.
train: The training folder path inside the dataset directory.
val: The validation folder path inside the dataset directory.

All four attributes are mandatory to start the training process correctly. Later, we will use the same YAML for training all three YOLOv8 instance segmentation models.

You will need to install the ultralytics API to train locally. This YOLOv8 tutorial contains the necessary steps to install it and also all inferences using several models.

Note: All training experiments were run on a machine with an Intel Xeon processor, 16 GB P100 GPU, and 32 GB of RAM.

Training YOLOv8 Nano Instance Segmentation Model

We will begin with the training of the Nano model – the smallest model in the YOLOv8 instance segmentation family.

Before starting the training, ensure the YAML file is in the same directory where you open the terminal else you will get a path error.

To start the training, execute the following command in the terminal.

yolo task=segment mode=train model=yolov8n-seg.pt imgsz=640 data=trash_inst_material.yaml epochs=100 batch=16 name=yolov8n-seg exist_ok=True amp=False

We use the yolo CLI to train the model. We use the following command line arguments in the above command:

task: This argument indicates the task we want to perform using the model. As we are training an instance segmentation model, the task here is segment.
mode: We can choose from train, predict, and val for the mode. Here, the mode is training specific.
model: This directly accepts the pretrained weight file name. If not already present, the yolo CLI will download it for the first time.
imgsz: The number of pixels on the longer side. By default, all YOLO models accept images reshaped with an aspect ratio maintained.
data: The path to the dataset YAML file.
epochs: This is the number of epochs we want to train the model on the dataset.
name: We can provide a custom result directory name using this argument.
exist_ok: This tells the CLI to use the same result directory if present without creating a new one.
amp: AMP stands for Automatic Mixed Precision. We are turning it off as some GPUs may not support it.

After training the model for 100 epochs, we get the following result.

Figure 3. YOLOv8 Nano instance segmentation results.

The graphs under (B) indicate the bounding box metrics and the ones under (M) indicate segmentation mask metrics.

The Nano model reaches box mAP of 42.6% and segmentation mAP of 34.5% on the last epoch. From the graphs, it looks like there is still room for improvement. But instead of training the Nano model for longer, let’s train a larger model.

Training YOLOv8 Small Instance Segmentation Model

To start the Small model training, we need to change the model and the resulting directory name.

yolo task=segment mode=train model=yolov8s-seg.pt imgsz=640 data=trash_inst_material.yaml epochs=100 batch=16 name=yolov8s-seg exist_ok=True amp=False

Figure 4. YOLOv8 Small instance segmentation results after training on the underwater trash detection dataset.

The Small model reaches slightly higher metrics within the same number of epochs. This time, the last epoch’s box mAP is 44.38%, and the segmentation mask mAP is 35.16%.

This appears to be a significant enhancement compared to our prior training experiment. Additionally, employing an even larger model could yield even more impressive outcomes.

Training YOLOv8 Medium Instance Segmentation Model

For our final training experiment, we will train the YOLOv8 Medium model for instance, segmentation.

Like the previous one, we only need to change the model name and experiment name in the training command.

yolo task=segment mode=train model=yolov8m-seg.pt imgsz=640 data=trash_inst_material.yaml epochs=100 batch=16 name=yolov8m-seg exist_ok=True amp=False

Figure 5. YOLOv8 Medium instance segmentation results.

With the YOLOv8 Medium model, we have the highest box mAP yet of 45%. Also, the segmentation mask mAP reaches 36.2%.

Clearly, this is the best model we have till now. For inference, we will use the weights of the YOLOv8 Medium mode.

Comparison Between the Trained Models

Before moving on to the inference section, let’s take a look at the box and segmentation mAP graphs of each of the trained models.

Figure 6. Bounding box mAP comparison after training the YOLOv8 instance segmentation models.

Figure 7. Segmentation mask mAP comparison after training the YOLOv8 instance segmentation models.

Inference on Validation Images

First, we will run inferences on the validation images and check the YOLOv8 Medium model’s performance.

Note: The inference experiments were run on a laptop with an i7 8th generation CPU, 6 GB GTX 1060 GPU, and 16 GB RAM.
The following command expects that the trained weights are in the runs directory created from the model training experiments.

yolo model=runs/segment/yolov8m-seg/weights/best.pt mode=predict source=trash_inst_material/val/images name=yolov8m_seg_infer_valimages exist_ok=True

We provide the path to the validation images directory, and the command will run inference on all images.

The following is a video where the inference image results have been combined into a single video. This provides an easier way to analyze the results.

Clip 1. Inference results on all the validation images (combined to make a video) after training the YOLOv8 Medium instance segmentation model.

The results are not perfect, but they are exceptional. The model has demonstrated accurate segmentation of the ROV in a majority of the frames, including challenging classes such as trash_wood.

Inference on Videos

For inference on videos, we have chosen a few videos that were part of the test set of the initial version of this dataset. These videos are complex and contain a lot of objects in a single frame.

There are two videos for inference that you can access while downloading the code for this article. We can execute the following command to start the experiments.

yolo model=runs/segment/yolov8m-seg/weights/best.pt mode=predict source=trash_segment_inference_data/manythings.mp4 name=runs_medium exist_ok=True

This time, the source file is the video file that we want to run inference on.

Here are the results.

Clip 2. YOLOv8 instance segmentation inference on an underwater trash detection video with a complex scene. The model is unable to predict objects confidently whenever the camera is moving at high speed.

It is clear this is a highly complex scene. Let’s break down all the places where the model is performing well and where it isn’t.

In the first few frames, there is a lot of flickering. This is mostly because of fast camera movement, and because of this, the segmentation and detection predictions suffer.
After a while, the predictions become much better, but when the ROV first appears, the model cannot detect it. This is because of the high number of objects already present in the bottom right corner.
In the final few frames, there is a crab present in the scene which the model cannot predict.

With the GTX 1060 GPU, we are getting over 30 FPS which is real-time performance.

Let’s run a final experiment on a simpler video.

yolo model=runs/segment/yolov8m-seg/weights/best.pt mode=predict source=trash_segment_inference_data/several.mp4 name=runs_medium exist_ok=True show=True

Clip 3. Inference on a comparatively simpler underwater trash detection scene using the YOLOv8 Medium instance segmentation model.

Interestingly, the model detects the distant trash classes correctly but is unable to detect the fish correctly.

Articles On YOLO That You Should Not Miss

Summary

In this article, we went through the process of training three different instance segmentation models using the Ultralytics library. We chose a fairly difficult real-world dataset that presents a considerable challenge to today’s object detection and segmentation models. Although the results were not perfect, we have a starting point.

The above results show how difficult instance segmentation problems can be when trying to solve a real-world problem. Throwing huge models at them for training is not a solution, as we need real-time performance most of the time. Share your thoughts in the comments on how to enhance this project and develop an even more advanced model.

References

Model-assisted labeling for YOLO(模型辅助标注)

作者: xsh
时间: 2025-02-08
分类: OpenCV_图像_视觉_算法,AI_神经网络
评论

March 21, 2023

10 min read

‍

Object detection and instance segmentation are crucial tasks in computer vision, with numerous applications ranging from self-driving cars to medical image analysis. However, a significant challenge is a need for large labeled datasets to train accurate models. Labeling datasets manually can be a tedious and time-consuming task, often requiring significant effort and resources.

To address this challenge, model-assisted labeling has emerged as a powerful technique that can save time and money by reducing the number of manual annotations required. In this blog post, we will explore how model-assisted labeling works and how it can accelerate the labeling process for both object detection and instance segmentation. As an example, we will demonstrate how you can use the trainYOLO platform to easily apply this method to your YOLOv8 object detection or instance segmentation training.

What is Model-Assisted Labeling?

Model-assisted labeling is a process that uses a (pre-)trained machine learning model to generate annotations for a dataset automatically. Specifically, the model is used to predict the labels of the objects in the dataset, and these predictions are then used as a starting point for the manual labeling process. The annotations generated by the model only need to be refined or corrected by human annotators, reducing the total number of annotations required as opposed to labeling from scratch.

For example, in object detection tasks, a pre-trained model can be used to generate bounding boxes around the objects in an image. These bounding boxes can then be used as a starting point for the manual labeling process, with human annotators refining and correcting the locations of the boxes as necessary. This approach can save significant time and effort compared to manual labeling from scratch. As an example, see the difference between manually labeling an image of pollen versus using a model-assisted method:

‍

Left: manual labeling, right: model-assisted labeling. As demonstrated, model-assisted labeling definitely increases labeling speed.

One of the key benefits of model-assisted labeling is that it enables a feedback loop between training the model and labeling more images. As the model gets better at generating initial annotations, the amount of manual work required decreases, allowing more time to be spent on training and improving the model. This cycle can be repeated multiple times, with each iteration resulting in a more accurate model and fewer manual annotations required. By continually improving both the model and the dataset, the overall accuracy of the object detection system can be significantly improved. Therefore, it is important to keep iterating and fine-tuning the model and dataset to achieve the best results possible.

‍

After each training iteration, the model's accuracy increases and the manual labeling time decreases.

‍

How to use model-assisted labeling on trainYOLO

The trainYOLO platform streamlines the process of model-assisted labeling for object detection and instance segmentation algorithms like YOLOv8 or YOLOv5. After each training iteration, you can upload your model to the platform and generate predictions with a click of a button. No need to upload predictions yourself.

As an example, let’s take a look at the steps to train a YOLOv8 pollen detector using the model-assisted labeling approach. For a more detailed guide about how to upload, label and train algorithms like YOLOv5 or YOLOv8 using trainYOLO, take a look at our other posts: YOLOv5 object detection, YOLOv8 object detection, YOLOv8 instance segmentation.

‍

1. Label an initial batch manually

To kickstart the process of model-assisted labeling, we begin by manually labeling the first batch of images. In this stage, it is crucial to choose a diverse set of images to improve the trained model's generalization. Also, the number of images that need to be labeled manually in this first batch varies depending on the task. If the images have many objects, fewer are required, and vice versa. In any case, the initial trained model's performance will indicate if more manual labeling is necessary.

‍

‍2. Train an initial model

Once the first batch of images is labeled, it's time to train the initial model. With trainYOLO's preconfigured Colab notebooks, training a YOLOv5 or YOLOv8 object detection or instance segmentation model is straightforward (see here for a detailed guide on how to start training). All you need to do is fill in your API key and Project name to start the training. Once finished, the model and its metrics are uploaded to our platform, where it is automatically deployed for use in model-assisted labeling. In our case, we labeled 50 images (comprising 1221 pollen) and reach the following accuracy:

With a score of 96.4 mAP, we have a perfect starting point for our model-assisted labeling. Looking back, we could have started out with a smaller batch of images.

‍

3. Model-assisted labeling

Now that we have an initial model trained, we can start to utilize it as a labeling assistant. As the model is automatically deployed on trainYOLO, it enables us to generate predictions with a click of a button (the magic paintbrush). Note that the labeling speedup is dependent on the performance of the initial model , but it is expected to improve significantly with additional training iterations.

4. Iterate

As previously mentioned, one of the significant advantages of model-assisted labeling is the feedback loop it creates between training the model and labeling more images. Therefore, it's recommended to train a new model each time we label another set of images. This approach enhances the prediction’s accuracy and accelerates the labeling process. A win-win for all.

Conclusion

Model-assisted labeling is a powerful technique for accelerating the labeling process for object detection tasks. By using the model to jump-start the initial annotations, model-assisted labeling can save significant time and money, while also improving the accuracy of annotations. To make the most of model-assisted labeling, it's important to focus on challenging images and correct the annotations generated by the model. By following these tips, you can effectively use model-assisted labeling to label datasets faster and more accurately.

In conclusion, model-assisted labeling is an important technique that can help overcome the challenges of manual labeling in object detection tasks. By leveraging the predictions of trained models combined with human expertise, model-assisted labeling can accelerate the labeling process while improving the accuracy of annotations. As computer vision applications continue to grow in importance and turnover rates continue to rise, model-assisted labeling is likely to become an increasingly valuable tool.

‍

目标检测、语义分割、实例分割的区别

作者: xsh
时间: 2025-02-08
分类: OpenCV_图像_视觉_算法,AI_神经网络
评论

目标检测、语义分割、实例分割依次如下图所示:

X-AnyLabeling 自动标注工具

作者: xsh
时间: 2025-02-08
分类: OpenCV_图像_视觉_算法,AI_神经网络
评论

github:

https://github.com/CVHub520/X-AnyLabeling

开源的数据标注工具

作者: xsh
时间: 2025-02-08
分类: OpenCV_图像_视觉_算法,AI_神经网络
评论

Data_Label_Tools

本文旨在收集整理开源的数据标注工具，方便使用，目前包括Image、Video、Text和Audio 4个方面。文中大部分工具，本人还未亲自测试过，标签或是归档不妥之处也会长期改进！

1. Image

1.1 bbox

labelImg
labelImg是基于python和Qt的跨平台目标检测标注工具，操作方便、快捷、实用，应用广泛。

bbox-label-tool
bbox-label-tool是基于python的目标检测标注工具，实现简单，使用方便，但仅支持单类标注。

LabelBoundingBox
LabelBoundingBox是bbox-label-tool的升级版，能适应多类标注。

Yolo_mark
Yolo_mark是针对Yolo v2目标检测的标注工具。

FastAnnotationTool
FastAnnotationTool是一款基于C++和opencv的强大的目标检测数据标注工具，支持数据和字母OCR标注，提供多种数据增强功能（尺寸剪切、翻转、旋转、缩放、椒盐噪声、高斯噪声、矩形合并、线提取等），支持带倾斜角度目标标注，实用性极强。

od-annotation
od-annotation采用python-flask框架开发，基于B/S方式交互，支持多人同时标注。

RectLabel
RectLabel即可画框（目标检测）又可画多边形（分割）

CVAT
CVAT高效的标注工具，图像分类，目标检测，语义分割，实例分割，支持本地部署

VoTT
VoTT 微软发布的Eeb方式部署标注工具，至此图像和视频；支持CNTK，Pascal Voc格式；支持导出TFRecord，CSV，VoTT格式

VIA-VGG Image Annotator
图像检测语义分割实例分割等;Web方式，也可本地部署；在人脸数据标注上提供了方便操作，人脸数据标注首选

Pixel Annotation Tool
语义分割实例分割标注神器

point-cloud-annotation-tool
3D点云数据标注神器；支持点云数据加载，保存与可视化；支持点云数据选择；支持3D Box框生成；支持KITTI-bin格式数据

boobs
Yolo bbox标注工具；支持输出YOLO/VOC/COCO格式

1.2 Mask

labelme
labelme是基于python和Qt的跨平台标注工具，支持图像分割标注，操作方便、快捷、实用，应用广泛。

pylabelme
pylabelme是基于python和Qt的跨平台标注工具，支持图像分割标注，操作方便、快捷、实用，应用广泛。

Labelbox
labelbox是一款多功能数据标注工具，支持图像分割、图像分类、文本分类标注，操作方便、快捷、实用，应用广泛。

ImageLabel
ImageLabel是基于Qt和Opencv的图像分割标注工具，支持手动绘制轮廓，可利用GrabCut进行半自动标注，方便使用。

ImageSegmentation
ImageSegmentation是基于python的图像分割标注工具，操作方便实用。

opensurfaces-segmentation-ui
opensurfaces-segmentation-ui是基于python的图像分割标注工具，操作方便实用。

labelImgPlus
labelImgPlus是labelImg的升级版，支持图像分割、图像分类、目标检测标注，操作方便，通用性极强，应用广泛。

2. Video

video_labeler
video_labeler是一款基于python的视频目标检测、目标跟踪标注工具，轻便实用。

vatic
vatic是一款基于python的视频目标检测、目标跟踪标注工具，轻便实用，应用广泛。

lane-detection-with-opencv
lane-detection-with-opencv是一款基于Opencv的视频车道检测标注工具，特殊场景标注工具，实用性强。

OpenLabel
Openlabel是一款基于Opencv的视频目标检测、目标跟踪标注工具，轻便实用，应用广泛。

3. Text

brat
brat是基于python的自然语言标注工具，设计灵活，实用，应用广泛。

MarqueeLabel
MarqueeLabel是基于Swift和C的自然语言标注工具，设计灵活，实用，应用广泛。

4. Audio

audio-annotator
audio-annotator是基于Javascript的音频标注工具，它可以实现无形、声谱图、声波进行可视化标注，通用性强，应用广泛。

youtube-chord-ocr
youtube-chord-ocr是基于python的音频标注工具，可以实现将youtube上带有和弦标签的音乐视频转化为带标签的音频文件，应用广泛。

MusicSegmentation
MusicSegmentation是一种基于matlab的音乐分割标记工具，它通过计算谐波和音色分割音乐并标记，应用广泛。

分类 OpenCV_图像_视觉_算法下的文章

Train YOLOv8 Instance Segmentation on Custom Data

YOLO Master Post – Every Model Explained

Mastering All YOLO Models from YOLOv1 to YOLO-NAS: Papers Explained (2024)

The Underwater Trash Instance Segmentation Dataset

The YOLOv8 Instance Segmentation Label Format

Preparing the Dataset YAML File

Training YOLOv8 Nano Instance Segmentation Model

Training YOLOv8 Small Instance Segmentation Model

Training YOLOv8 Medium Instance Segmentation Model

Comparison Between the Trained Models

Inference on Validation Images

Inference on Videos

Articles On YOLO That You Should Not Miss

Summary

References

Model-assisted labeling for YOLO(模型辅助标注)

What is Model-Assisted Labeling?

How to use model-assisted labeling on trainYOLO

Conclusion

目标检测、语义分割、实例分割的区别

X-AnyLabeling 自动标注工具

开源的数据标注工具

Data_Label_Tools

Table of Contents

1. Image

1.1 bbox

1.2 Mask

2. Video

3. Text

4. Audio

最新文章

最近回复

分类

归档

其它

分类 OpenCV_图像_视觉_算法 下的文章

Train YOLOv8 Instance Segmentation on Custom Data

YOLO Master Post – Every Model Explained

Mastering All YOLO Models from YOLOv1 to YOLO-NAS: Papers Explained (2024)

The Underwater Trash Instance Segmentation Dataset

The YOLOv8 Instance Segmentation Label Format

Preparing the Dataset YAML File

Training YOLOv8 Nano Instance Segmentation Model

Training YOLOv8 Small Instance Segmentation Model

Training YOLOv8 Medium Instance Segmentation Model

Comparison Between the Trained Models

Inference on Validation Images

Inference on Videos

Articles On YOLO That You Should Not Miss

Summary

References

Model-assisted labeling for YOLO(模型辅助标注)

What is Model-Assisted Labeling?

How to use model-assisted labeling on trainYOLO

Conclusion

目标检测、语义分割、实例分割的区别

X-AnyLabeling 自动标注工具

开源的数据标注工具

Data_Label_Tools

Table of Contents

1. Image

1.1 bbox

1.2 Mask

2. Video

3. Text

4. Audio

最新文章

最近回复

分类

归档

其它

分类 OpenCV_图像_视觉_算法下的文章