Select to view content in your preferred language

Advice on "grids" parameter in SingleShotDectector for deep learning

1107
2
08-13-2021 07:03 AM
Labels (2)
DuncanHornby
MVP Notable Contributor

Hi,

So I've been teaching myself the basic workflow on how to use the deep learning tools in ArcPro. I've had some great help in my Q&A on pixel classification here. Feeling brave I am now having a go at object detection using the SingleShotDectector.  I gave myself a task of identifying boats in an estuary and followed the basic workflow discussed on github here. By the way there are a few other really useful notebooks and once you get over the jargon barrier and the frightening complexity of deep learning these notebook are really helpful in teaching you a workflow.

So my first run through and it works! The red circles were my training samples and yellow boxes are the detections.

DuncanHornby_0-1628862014237.png

 

As you can clearly see a lot of boats are being missed. I understand that now I need to tweak the workflow. More training samples (red circles) and running the fit() for longer (more epochs) seem to be an obvious option but then there are all those other parameters...

When I define the classifier in notebook I used this:

 

 

ssd = SingleShotDetector(data, grids=[5], zooms=[1.0], ratios=[[1.0, 1.0]], backbone='resnet34', backend='pytorch')

 

 

 

One of the parameters is grids and it is actually discussed here. It's not at all clear what the implication of this grid value is and how it is used.  Is choosing a grid value of say 100 better than 5, what's the impact of setting a higher or lower grid value? In the example they use a 4x4, 4x4 of what? Let me explain...

Imagine my satellite image is 10m resolution and is 100x100 pixels, so 10,000 pixel in total. Is setting this grid parameter to 4 mean that it is diving up the image into 4 50x50 pixel regions?

Should this grid parameter be set at a value that would match the tile size during the export training data, for example if I exported tiles at 25x25 I would pick a grid size of 16? Or is that irrelevant?

This parameter is a list so can take many values, why would one use multiple grid values?

I've have yet to find any advice on the ESRI site about why you would choose one grid size over another and how you choose an appropriate grid value, the api offers next to no explanation of this parameter.

I'm being very cheeky here and tagging you in @Tim_McGinnes as you were a super star in my other thread!

But any help from anyone is much appreciated.

0 Kudos
2 Replies
Tim_McGinnes
Frequent Contributor

There is a very good explanation of grids and anchor boxes on this  page (just the first third of the page): https://machinethink.net/blog/object-detection/ 

I think it is about having enough grid cells to detect all the objects in the image. So in your image of boats it may make sense to have more grid cells (because the boats are small in size compared to the image as a whole). However ArcGIS is probably splitting the original image up and passing each section through object detection and compiling the results at the end, so I'm not sure how that fits into the equation.

Imagine my satellite image is 10m resolution and is 100x100 pixels, so 10,000 pixel in total. Is setting this grid parameter to 4 mean that it is diving up the image into 4 50x50 pixel regions?

The grid parameter of 4 actually gives you a 4 x 4 grid, so you would get 16 25x25 pixel regions.

Yes, you can provide a list of grid sizes, such as [4,2,1]. I think what is happening in this case is that it is actually training the model in parallel with 4x4, 2x2 and 1x1 grids.

I think a downside of of having large or multiple grids would be that the model would be larger, take longer to train and require more memory to train\use.

Like all this deep learning stuff, it's all trial and error really. You can make some logical assumptions, but until you try it out you don't really know what will happen.

by Anonymous User
Not applicable

For the grid parameter you should consider average size of an object. The value you put there is in pixels that the subject object can cover in height, width.

Also if you want to try another model instead of SSD, you can try training a FasterRCNN model. You can train FasterRCNN with the same training data you are using to train the SSD model . Here is the guide for this model.

Thanks,

Sandeep