Select to view content in your preferred language

Pool Object Detection Using Pre-trained Model

245
3
07-24-2025 05:23 AM
Labels (1)
DimitrisPsarologos
New Contributor

For research purposes, I aim to detect the number and estimate the shape area of swimming pools on Rhodes Island using the pre-trained deep learning model Pool Segmentation - USA. However, I am currently facing challenges related to both the accuracy of the detection results and the processing time of the input data. Below, I outline the full workflow I’m following:

Step 1: Data Preparation

Due to the lack of high-resolution imagery in a format compatible with the pre-trained model, I am using World Imagery Wayback basemaps to manually export imagery in .tpkx or .tif format for areas where pools are visually identified.

  • When exporting in .tpkx, I convert the files to 3-band 8-bit TIFFs using a Python notebook.
  • After collecting all the relevant .tif files, I use the Mosaic to New Raster tool in ArcGIS Pro to merge the inputs into a single raster dataset. This prepares the data for model inference.

Step 2: Running the Model

Once the raster is ready, I use the Detect Objects Using Deep Learning tool in ArcGIS Pro:

  • Input: the merged .tif raster (3-band, 8-bit), which in my case is ~2.3 GB.
  • Model: Pool Segmentation - USA with default parameters.
  • Processor type: GPU
  • Hardware: I run the model on a Virtual Machine with the following specs:
    64 GB RAM, Intel Xeon Gold 5220R CPU, and NVIDIA A10-12Q GPU.

DimitrisPsarologos_0-1753359525662.png

 

DimitrisPsarologos_1-1753359525666.png

 

Issues Encountered

  1. Accuracy: In tests with smaller input areas, I noticed that the model often fails to detect several visible pools.
  2. Performance: Despite utilizing a GPU, processing the full mosaic raster takes a significant amount of time, or in some cases, the model unexpectedly fails to run altogether.

 

Request for Suggestions

Do you have any recommendations to improve either:

  • The data preparation process (e.g., optimal input resolution, format, preprocessing), or
  • The model inference step (e.g., parameter tuning, tiling, hardware optimization)
    in order to increase the efficiency and accuracy of the final outputs?
0 Kudos
3 Replies
PavanYadav
Esri Regular Contributor

hi @DimitrisPsarologos I have reported this to my team and hope to have a response soon. thanks!

 

Pavan Yadav
Product Engineer at Esri
AI for Imagery
Connect with me on LinkedIn!
Contact Esri Support Services
0 Kudos
PriyankaTuteja
Esri Contributor

Hello @DimitrisPsarologos  

Thank you for reaching out! I have a few follow-up questions based on the description you provided:

  1. What is the resolution of the input raster you’re using for inferencing with the pool segmentation model?

  2. Why did you check the Use pixel space option? The Wayback imagery you used should already be geo-referenced, so it can be processed in Map Space without selecting pixel space. Could you confirm if you intentionally enabled this?

  3. You mentioned that the tool errors out in some cases when run on the full image extent. Could you share the error trace for those cases?

In addition, I’d like to suggest a few steps to improve results:

  • Use the recommended cell size for the pool segmentation model instead of the default value.
  • Lower the threshold to around 0.2 to segment pools with lower confidence, and then apply a definition query over the threshold field to filter the results.
  • To reduce processing time, providing the cell size should help.
  • If the error you encountered is a CUDA out-of-memory issue, try lowering the batch size from 64 to 4 or 8 — this should help resolve it.
0 Kudos
DimitrisPsarologos
New Contributor

I tried a different approach in the data management and running the model, including your suggestions

Let me, be more specific

First of all, i tried to merge all the tiffs with Mosaic data management tool, instead of Mosaic to New Raster and i created a new merged tiff. 

Here is the new Raster information: 

Columns: 177969
Rows: 260958
Number of Bands: 3
Cell Size X: 0.2985821416443992
Cell Size Y: 0.2985821416444002
Uncompressed Size: 129.76 GB
Format:TIFF
Source Type: Generic
Pixel Type: unsigned char
Pixel Depth: 8 Bit
NoData Value: 256, 256, 256
Colormap: absent
Pyramids levels: 8, resampling: Nearest Neighbor
Compression: LZW
Mensuration Capabilities: Basic

Secondly, i run the model in different smaller extents instead of the whole area, and then i merged the output layers

DimitrisPsarologos_0-1756476152958.png

 

For the settings of the model i changed:

-The cell size to 0.3 as it is recommended from the documentation

- The batch size to 4

- Deactivated the Use Pixel space

- Test time augmentation to false to reduce the time of each process 

The output of the model i manually clean it by deleting the false positives. 

Also in during the process of cleaning i noticed many false negatives. 

Specifically the model found 1300+ pools, which 1130 of them was actually pool. Also the estimated pool number in the case study area is approximately 2000.

For the next step I'm considering to utilize the false negatives to re-train the model for increasing it's accuracy from it's current 0.59

I really appreciate your opinion in the current process 

Also i would like to ask, if there is any recommended number of samples to start with. What are you suggesting? 

Thank you for your kind support 

 

0 Kudos