For research purposes, I aim to detect the number and estimate the shape area of swimming pools on Rhodes Island using the pre-trained deep learning model Pool Segmentation - USA. However, I am currently facing challenges related to both the accuracy of the detection results and the processing time of the input data. Below, I outline the full workflow I’m following:
Step 1: Data Preparation
Due to the lack of high-resolution imagery in a format compatible with the pre-trained model, I am using World Imagery Wayback basemaps to manually export imagery in .tpkx or .tif format for areas where pools are visually identified.
Step 2: Running the Model
Once the raster is ready, I use the Detect Objects Using Deep Learning tool in ArcGIS Pro:
Issues Encountered
Request for Suggestions
Do you have any recommendations to improve either:
Solved! Go to Solution.
Hello and thank you for your feedback
After cleaning the dataset output from the pre - trained model, i tried to re-train the model by taking around 170 new labels, just to do a test if there will be any difference, without success though. After seeing your feedback is logical because the model cannot be fine tunned.
For my research, i need the segment of the pools and not only the location, because i want to be able to estimate the water capacity afterwards.
After your feedback and the tests I'm doing, if i understood correctly, the actual steps i have to follow to complete the process are:
If the process above is correct there also some side problems, because the process needs:
If I'm correct, are there any ideas or tools to reduce the manual work issue;
In concluding, i suppose is needed much more data and time to make a new model work and making decent outputs
Please tell me if you have any further suggestions and ideas
Thank you again, a lot for your feedback and your kind support
Dimitris
hi @DimitrisPsarologos I have reported this to my team and hope to have a response soon. thanks!
Hello @DimitrisPsarologos
Thank you for reaching out! I have a few follow-up questions based on the description you provided:
What is the resolution of the input raster you’re using for inferencing with the pool segmentation model?
Why did you check the Use pixel space option? The Wayback imagery you used should already be geo-referenced, so it can be processed in Map Space without selecting pixel space. Could you confirm if you intentionally enabled this?
You mentioned that the tool errors out in some cases when run on the full image extent. Could you share the error trace for those cases?
In addition, I’d like to suggest a few steps to improve results:
I tried a different approach in the data management and running the model, including your suggestions
Let me, be more specific
First of all, i tried to merge all the tiffs with Mosaic data management tool, instead of Mosaic to New Raster and i created a new merged tiff.
Here is the new Raster information:
Columns: 177969
Rows: 260958
Number of Bands: 3
Cell Size X: 0.2985821416443992
Cell Size Y: 0.2985821416444002
Uncompressed Size: 129.76 GB
Format:TIFF
Source Type: Generic
Pixel Type: unsigned char
Pixel Depth: 8 Bit
NoData Value: 256, 256, 256
Colormap: absent
Pyramids levels: 8, resampling: Nearest Neighbor
Compression: LZW
Mensuration Capabilities: Basic
Secondly, i run the model in different smaller extents instead of the whole area, and then i merged the output layers
For the settings of the model i changed:
-The cell size to 0.3 as it is recommended from the documentation
- The batch size to 4
- Deactivated the Use Pixel space
- Test time augmentation to false to reduce the time of each process
The output of the model i manually clean it by deleting the false positives.
Also in during the process of cleaning i noticed many false negatives.
Specifically the model found 1300+ pools, which 1130 of them was actually pool. Also the estimated pool number in the case study area is approximately 2000.
For the next step I'm considering to utilize the false negatives to re-train the model for increasing it's accuracy from it's current 0.59
I really appreciate your opinion in the current process
Also i would like to ask, if there is any recommended number of samples to start with. What are you suggesting?
Thank you for your kind support
Hello, how is your progess so far?
Based on my experience, the pretrained model is not a Swiss knife to solve all tasks; however, you can use pretrained model to create the label which is "faster" than manual labeling.
Of course, the inference result may not be "satisfied", it will include many false negative objects. You still need to clean the result.
I saw that you want to train the model for Swimming Pool Detection/Segmentation? There are several factors you may need to consider.
1. The number of labels/objects: It is somewhat difficult to determine this number. But I suggest you can have about 5000 - 6000 objects. You should consider to collect the label in different regions (maybe 2 - 3 cities or areas?)
2. For the task, you are using Pool Segmentation USA model. This model cannot be finetuned futher, as it mentioned in this link https://www.arcgis.com/home/item.html?id=0d4b8ab238b74da8819df21834338c0d . Therefore you need to train new model.
3. For new model, you need to consider? What are you trying to achieve?
Hello and thank you for your feedback
After cleaning the dataset output from the pre - trained model, i tried to re-train the model by taking around 170 new labels, just to do a test if there will be any difference, without success though. After seeing your feedback is logical because the model cannot be fine tunned.
For my research, i need the segment of the pools and not only the location, because i want to be able to estimate the water capacity afterwards.
After your feedback and the tests I'm doing, if i understood correctly, the actual steps i have to follow to complete the process are:
If the process above is correct there also some side problems, because the process needs:
If I'm correct, are there any ideas or tools to reduce the manual work issue;
In concluding, i suppose is needed much more data and time to make a new model work and making decent outputs
Please tell me if you have any further suggestions and ideas
Thank you again, a lot for your feedback and your kind support
Dimitris
@DimitrisPsarologos You could also try applying Non Maximum Suppression to reduce Overlapping detections.
Hello and thank you for your advice
I also did a test in training models, which i want to share with you
First of all i changed a bit my aspect of my research and adapted it in finding first the locations of the pools (and their total number also), so i changed the pre-trained model with the Pool Object detection - USA, which can be retrained, instead of segmentation.
Then used i exported as a training data the datasets which collected and cleaned by the first model, plus the 170 pools which i collected manually in a single feature layer.
With almost 1300 pools as training data ready, i begun the training sessions, where i retrained the pre-trained model Pool Object detection - USA, and 2 new fresh models, 1 in Faster RCNN and 1 in YOLOv3 architecture
Here are my results:
Here are my conclusions:
Re-training the pre-trained model did not show any significant improvement. On the contrary, the fresh models showed much better results in terms of AP. In particular, the YOLO model, in addition to the better theoretical performance of the AP score compared to the other models, also had a faster learning rate in terms of learning speed.
Thank you for your kind support
Dimitris