Data extraction and formatting
Most of the commands in this package require prior export of an Arkindex corpus containing the dataset to be formatted. From this export, data can be extracted and formatted in the correct format using on of the two commands available:
Extracted data structure
Classification
The data structure is as follows:
dataset/
│
├── train/ # Training dataset
│ ├── class_1/ # Folder containing images for class 1
│ │ ├── img1.jpg
│ │ ├── img2.jpg
│ │ └── ...
│ ├── class_2/ # Folder containing images for class 2
│ │ ├── img1.jpg
│ │ ├── img2.jpg
│ │ └── ...
│ └── ...
├── val/ # Validation dataset
│ ├── class_1/ # Folder containing images for class 1
│ │ ├── img1.jpg
│ │ └── ...
│ ├── class_2/ # Folder containing images for class 2
│ │ ├── img1.jpg
│ │ └── ...
│ └── ...
├── test/ # Testing dataset (optional)
│ ├── class_1/ # Folder containing images for class 1
│ │ ├── img1.jpg
│ │ └── ...
│ ├── class_2/ # Folder containing images for class 2
│ │ ├── img1.jpg
│ │ └── ...
│ └── ...
└── ─── ───
Detection & Segmentation & OBB
The files should be organized as follows:
dataset/
├── images/ # Directory for all images
│ ├── img1.jpg
│ ├── img2.jpg
│ └── ...
├── labels/ # Directory for all detect/segmentation/OBB masks (labels)
│ ├── img1.txt
│ ├── img2.txt
│ └── ...
├── train.txt
├── val.txt
├── test.txt
└── data.yaml # File containing the proper redirections, configurations and class names
The differences between the tasks lies in the content of the *.txt files.
-
Detection (bounding boxes):
<class_id> <x> <y> <w> <h> -
OBB/Segmentation (polygons):
<class_id> <x1> <y1> <x2> <y2> <x3> <y3> <x4> <y4>
*Only 4 pairs for obb.* By default if you lanch a training with an obb model, the Ultralytics library understand polygons with more than 4 points by automatically taking the minimal rectangle around the given polygon.
The data.yaml file contains:
---
path: ../datasets/
test: test.txt
train: train.txt
val: val.txt
names:
0: class_id_1
1: class_id_2
...
The <set>.txt files contain the paths to the images for each corresponding set.
Make sure you are in the right folder when calling the train functions, or that the path relative to the source folder are well referenced.