Select to view content in your preferred language

Pool Object Detection Using Pre-trained Model

401
7
Jump to solution
07-24-2025 05:23 AM
Labels (1)
DimitrisPsarologos
Emerging Contributor

For research purposes, I aim to detect the number and estimate the shape area of swimming pools on Rhodes Island using the pre-trained deep learning model Pool Segmentation - USA. However, I am currently facing challenges related to both the accuracy of the detection results and the processing time of the input data. Below, I outline the full workflow I’m following:

Step 1: Data Preparation

Due to the lack of high-resolution imagery in a format compatible with the pre-trained model, I am using World Imagery Wayback basemaps to manually export imagery in .tpkx or .tif format for areas where pools are visually identified.

  • When exporting in .tpkx, I convert the files to 3-band 8-bit TIFFs using a Python notebook.
  • After collecting all the relevant .tif files, I use the Mosaic to New Raster tool in ArcGIS Pro to merge the inputs into a single raster dataset. This prepares the data for model inference.

Step 2: Running the Model

Once the raster is ready, I use the Detect Objects Using Deep Learning tool in ArcGIS Pro:

  • Input: the merged .tif raster (3-band, 8-bit), which in my case is ~2.3 GB.
  • Model: Pool Segmentation - USA with default parameters.
  • Processor type: GPU
  • Hardware: I run the model on a Virtual Machine with the following specs:
    64 GB RAM, Intel Xeon Gold 5220R CPU, and NVIDIA A10-12Q GPU.

DimitrisPsarologos_0-1753359525662.png

 

DimitrisPsarologos_1-1753359525666.png

 

Issues Encountered

  1. Accuracy: In tests with smaller input areas, I noticed that the model often fails to detect several visible pools.
  2. Performance: Despite utilizing a GPU, processing the full mosaic raster takes a significant amount of time, or in some cases, the model unexpectedly fails to run altogether.

 

Request for Suggestions

Do you have any recommendations to improve either:

  • The data preparation process (e.g., optimal input resolution, format, preprocessing), or
  • The model inference step (e.g., parameter tuning, tiling, hardware optimization)
    in order to increase the efficiency and accuracy of the final outputs?
0 Kudos
1 Solution

Accepted Solutions
DimitrisPsarologos
Emerging Contributor

Hello and thank you for your feedback

After cleaning the dataset output from the pre - trained model, i tried to re-train the model by taking around 170 new labels, just to do a test if there will be any difference, without success though. After seeing your feedback is logical because the model cannot be fine tunned. 

For my research, i need the segment of the pools and not only the location, because i want to be able to estimate the water capacity afterwards.

After your feedback and the tests I'm doing, if i understood correctly, the actual steps i have to follow to complete the process are:

  1. Use the pre-trained segmentation model for faster labeling pools, to prepare the training data
  2. Cleaning the output data from the pre-trained model, by deleting the false positives
  3. Adding manually labels from false negatives
  4. Merging the model output data with the new labels
  5. Training by creating a fresh model (the second time of training using the fresh trained model)  
  6. Running the model and evaluating the results
  7. Repeat from step 1, until the accuracy of the new model is decent enough  (e.g. 80%+)

If the process above is correct there also some side problems, because the process needs:

  • A lot of new samples for training a new model, for multiple areas. So many more imagery data 
  • A lot of manual work for cleaning datasets and collecting new samples 
  • Many re-train sessions for increasing the quality and performance, in terms of accuracy of the model we made

If I'm correct, are there any ideas or tools to reduce the manual work issue;

In concluding, i suppose is needed much more data and time to make a new model work and making decent outputs

Please tell me if you have any further suggestions and ideas

Thank you again, a lot for your feedback and your kind support 

 

Dimitris

 

 

View solution in original post

7 Replies
PavanYadav
Esri Regular Contributor

hi @DimitrisPsarologos I have reported this to my team and hope to have a response soon. thanks!

 

Pavan Yadav
Product Engineer at Esri
AI for Imagery
Connect with me on LinkedIn!
Contact Esri Support Services
0 Kudos
PriyankaTuteja
Esri Contributor

Hello @DimitrisPsarologos  

Thank you for reaching out! I have a few follow-up questions based on the description you provided:

  1. What is the resolution of the input raster you’re using for inferencing with the pool segmentation model?

  2. Why did you check the Use pixel space option? The Wayback imagery you used should already be geo-referenced, so it can be processed in Map Space without selecting pixel space. Could you confirm if you intentionally enabled this?

  3. You mentioned that the tool errors out in some cases when run on the full image extent. Could you share the error trace for those cases?

In addition, I’d like to suggest a few steps to improve results:

  • Use the recommended cell size for the pool segmentation model instead of the default value.
  • Lower the threshold to around 0.2 to segment pools with lower confidence, and then apply a definition query over the threshold field to filter the results.
  • To reduce processing time, providing the cell size should help.
  • If the error you encountered is a CUDA out-of-memory issue, try lowering the batch size from 64 to 4 or 8 — this should help resolve it.
0 Kudos
DimitrisPsarologos
Emerging Contributor

I tried a different approach in the data management and running the model, including your suggestions

Let me, be more specific

First of all, i tried to merge all the tiffs with Mosaic data management tool, instead of Mosaic to New Raster and i created a new merged tiff. 

Here is the new Raster information: 

Columns: 177969
Rows: 260958
Number of Bands: 3
Cell Size X: 0.2985821416443992
Cell Size Y: 0.2985821416444002
Uncompressed Size: 129.76 GB
Format:TIFF
Source Type: Generic
Pixel Type: unsigned char
Pixel Depth: 8 Bit
NoData Value: 256, 256, 256
Colormap: absent
Pyramids levels: 8, resampling: Nearest Neighbor
Compression: LZW
Mensuration Capabilities: Basic

Secondly, i run the model in different smaller extents instead of the whole area, and then i merged the output layers

DimitrisPsarologos_0-1756476152958.png

 

For the settings of the model i changed:

-The cell size to 0.3 as it is recommended from the documentation

- The batch size to 4

- Deactivated the Use Pixel space

- Test time augmentation to false to reduce the time of each process 

The output of the model i manually clean it by deleting the false positives. 

Also in during the process of cleaning i noticed many false negatives. 

Specifically the model found 1300+ pools, which 1130 of them was actually pool. Also the estimated pool number in the case study area is approximately 2000.

For the next step I'm considering to utilize the false negatives to re-train the model for increasing it's accuracy from it's current 0.59

I really appreciate your opinion in the current process 

Also i would like to ask, if there is any recommended number of samples to start with. What are you suggesting? 

Thank you for your kind support 

 

0 Kudos
ThangPham
Emerging Contributor

Hello, how is your progess so far?

Based on my experience, the pretrained model is not a Swiss knife to solve all tasks; however, you can use pretrained model to create the label which is "faster" than manual labeling.

Of course, the inference result may not be "satisfied", it will include many false negative objects. You still need to clean the result.

I saw that you want to train the model for Swimming Pool Detection/Segmentation? There are several factors you may need to consider.
1. The number of labels/objects: It is somewhat difficult to determine this number. But I suggest you can have about 5000 - 6000 objects. You should consider to collect the label in different regions (maybe 2 - 3 cities or areas?)

2. For the task, you are using Pool Segmentation USA model. This model cannot be finetuned futher, as it mentioned in this link https://www.arcgis.com/home/item.html?id=0d4b8ab238b74da8819df21834338c0d . Therefore you need to train new model.

3. For new model, you need to consider? What are you trying to achieve?

  • You want the similar result like the result from the pretrained model? => you will need the label like this (the segment of swimming pool) and use semantice segmentation model (Pixel classification)
  • Or you don't care about the segment of swimming pool, just the location of swimming pool is good enough? => you need to convert your current label into "bounding box" and use detect objects model (like Retina Net or Faster RCNN...)
0 Kudos
DimitrisPsarologos
Emerging Contributor

Hello and thank you for your feedback

After cleaning the dataset output from the pre - trained model, i tried to re-train the model by taking around 170 new labels, just to do a test if there will be any difference, without success though. After seeing your feedback is logical because the model cannot be fine tunned. 

For my research, i need the segment of the pools and not only the location, because i want to be able to estimate the water capacity afterwards.

After your feedback and the tests I'm doing, if i understood correctly, the actual steps i have to follow to complete the process are:

  1. Use the pre-trained segmentation model for faster labeling pools, to prepare the training data
  2. Cleaning the output data from the pre-trained model, by deleting the false positives
  3. Adding manually labels from false negatives
  4. Merging the model output data with the new labels
  5. Training by creating a fresh model (the second time of training using the fresh trained model)  
  6. Running the model and evaluating the results
  7. Repeat from step 1, until the accuracy of the new model is decent enough  (e.g. 80%+)

If the process above is correct there also some side problems, because the process needs:

  • A lot of new samples for training a new model, for multiple areas. So many more imagery data 
  • A lot of manual work for cleaning datasets and collecting new samples 
  • Many re-train sessions for increasing the quality and performance, in terms of accuracy of the model we made

If I'm correct, are there any ideas or tools to reduce the manual work issue;

In concluding, i suppose is needed much more data and time to make a new model work and making decent outputs

Please tell me if you have any further suggestions and ideas

Thank you again, a lot for your feedback and your kind support 

 

Dimitris

 

 

PriyankaTuteja
Esri Contributor

@DimitrisPsarologos  You could also try applying Non Maximum Suppression to reduce Overlapping detections.

0 Kudos
DimitrisPsarologos
Emerging Contributor

Hello and thank you for your advice 

I also did a test in training models, which i want to share with you

First of all i changed a bit my aspect of my research and adapted it in finding first the locations of the pools (and their total number also), so i changed the pre-trained model with the Pool Object detection - USA, which can be retrained, instead of segmentation.

Then used i exported as a training data the datasets which collected and cleaned by the first model, plus the 170 pools which i collected manually in a single feature layer. 

With almost 1300 pools as training data ready, i begun the training sessions, where i retrained the pre-trained model Pool Object detection - USA, and 2 new fresh models, 1 in Faster RCNN and 1 in YOLOv3 architecture

Here are my results: 

  • Pre-trained model Pool Object detection - USA: AP = 59%
  • Re-trained pre-trained model Pool Object detection - USA: AP = 60% 
  • YOLOv3 fresh model: AP = 64,7%
  • Faster RCNN fresh model: AP=64,4%

DimitrisPsarologos_0-1757928162631.png

Here are my conclusions:

Re-training the pre-trained model did not show any significant improvement. On the contrary, the fresh models showed much better results in terms of AP. In particular, the YOLO model, in addition to the better theoretical performance of the AP score compared to the other models, also had a faster learning rate in terms of learning speed.

Thank you for your kind support

Dimitris

0 Kudos