Hi All,
I'm trying to train SSD model with arcgis.learn to detect trees from aerial photos. After few adjustments, the model was trained with lr 0.0005 and 8 epoches. However, looking at the "average_precision_score", it is only 0.06. Has anyone crossed this issue and how to improve this parameter? And how much is an acceptable score? I have searched for docs related to "average_precision_score" but not much found.
Below is the content of my emd.
Thank you and kind regards, Lan
{
"Framework": "arcgis.learn.models._inferencing",
"InferenceFunction": "ArcGISObjectDetector.py",
"ModelConfiguration": "_DynamicSSD",
"ModelType": "ObjectDetection",
"ExtractBands": [
0,
1,
2
],
"backbone": "resnet34",
"Grids": [
4
],
"Zooms": [
1.0
],
"Ratios": [
[
1.0,
1.0
]
],
"SSDVersion": 2,
"Classes": [
{
"Value": 1,
"Name": "Tree",
"Color": [
102,
2,
7
]
}
],
"ModelFile": "treecount_chip64_lr0005.pth",
"ImageHeight": 64,
"ImageWidth": 64,
"ImageSpaceUsed": "MAP_SPACE",
"LearningRate": "5.0000e-04",
"ModelName": "SingleShotDetector",
"backend": "pytorch",
"ModelParameters": {
"backbone": "resnet34",
"backend": "pytorch"
},
"average_precision_score": {
"Tree": 0.06539047501100659
},
"resize_to": null,
"IsMultispectral": false
}
Hi Lan,
8 epochs is not very long to train the model for. The first thing I would suggest is to train the model longer and see if the precision improves. If you are training from a notebook then the show_results function is useful as you can see what objects are being detected as you train.
The next easiest thing to try would be to use a different object detection model like FasterRCNN or YOLOv3 - you can use your existing training data and see if they give better results.
If you still don’t get good results you may have to capture more data to train with or try other methods to improve accuracy.
Hi Tim,
For SSD I had to use chip 64x64, lr0006, 9 epochs and got a good acc score of 0.26. I kept retrain the model with the same lr and number of epochs. However, the second iteration (9 + 9 ep) actually yields lower score (0.17) and the next iteration (9+9+9) is the worst. I have seen the valid_loss when down all the time, wonder why the output is getting worst. I attached the best output I have.
I have not tried YOLOv3 yet, maybe today. But I have tried the FasterRCNN yesterday with 64x64 chip, with recommended lr of 00005. However, I could not predict anything, acc score is 0. Interesting.
I tried to export chip size of 256 and then 128,. However, ArcPro generated a small number of chips that not even enough to run the data prep. Could be issue with my sampling strategy or the input image need to be preprocessed to improve the sharpness in between objects (trees).
Thanks,
Lan
How many trees have you labelled? How many image chips are being exported at 64/128/256 chip sizes? You may need to increase your training data set size.
It looks like there are other trees that are not labelled in your image chips. This becomes a problem when the model is trying to validate its results. You can see in the prediction images that it looks like it has correctly predicted some trees. However in your ground truth image, there is no corresponding box, so it probably considers them as errors (therefore reducing your prediction score). It may actually be detecting more and more trees successfully as you train but thinks they are all errors. I think it's good practise to label every object in your training data.
And that imagery is not really great, is it. I think source imagery quality is the greatest single success factor for deep learning - I think I would have difficulty in accurately finding all the trees in those images. Some imagery enhancement may indeed help.
Thanks Tim!
Your answer is so sharp! it shades some lights now.
I have a subset of 50cm res image, dimension 5818x4969 (attached images), I tried to get 908 tree samples on that images but they are quite clustered. IF "it's good practise to label every object in your training data" I may have to reduce the dimension of the subset raster and label more trees. Labelling all trees in the 5818x4969 is time consuming :). I will try this option first!
Training data export: 64x64 with stride 32x32: 370 imgs, 128x128 stride 64x64: 310imgs, 256x256 stride 128x128: 129imgs.
Imagery quality: the subset looks OK at 1:1000 zoom scale (attached). However, at 50cm resolution, we do not expect much.
Imagery enhancement: I know that preprocessing/enhancing imagery is good for classification/DL. What would you recommend in term of software/methodology?
Thank you very much!