'Training loss' gets 'nan' when training deeplearning model

3557
10
08-04-2021 01:46 AM
梁飞王
New Contributor

hi all, I am trying to extract greenspace from the drone. but when I training the deep learning model, the  'Training loss' and validation loss returns  'nan', and the digram is blank. I tried to increase the label, using different types of models, but still the same.

I don't know which mistake I have made.  can anybody help me out?

_0-1628066458634.png

 

0 Kudos
10 Replies
by Anonymous User
Not applicable

@梁飞王 Can you share some more information about:

- Type of model

- Number of chips in training data

- Any noData in training data

 

Thanks,

Sandeep

0 Kudos
MartyRyan
New Contributor III

Sandeep,

I am having the same issue - 

- Training a U-Net model

- "chips" in Training Data - not exactly sure what you mean, but here are my numbers; images (output from Export Training Data tool) = 2,199,856 

- Training Data - I used a previously classified land cover dataset (on advice from an Esri rep) to export as Training Data - it has 15,512 polygons  - a portion of its attribute table is attached)

If these "nan" values are an error, along with the identical accuracy value for both epochs, how can I fix this while the model is still training? We are using a lot of time to run these tools, and need a useable result.

Thanks for your help

MartyRyan_0-1628857779132.png

NOTE: I used the field LCCODE (which has values ) when exporting the training data - I'm having the sinking feeling that the "Name" and "Value" fields are the "No Data" items you are referring to? Am I resigned to stopping this model, calculating these fields to have values, exporting them again as training data and in effect starting all over?

 

0 Kudos
MartyRyan
New Contributor III

Sandeep, 

I am having the same issue - any help is greatly appreciated.

Type of model: U-Net (pixel classification)

Chips in Training data - 2.199.856 images in Output folder (from Export Training Data tool)

Any noData in training data - not exactly clear on this but see attached documents for specifics. Thanks 

What are my options while the tool is running, if any?

Thank you

0 Kudos
by Anonymous User
Not applicable

Hi @MartyRyan

Can you share a sample of your training data ?

 

Thanks,

Sandeep

0 Kudos
MartyRyan
New Contributor III

Sandeep, I have solved my issue by adding and properly populating the clsname and clsvalue fields in my training data and also changing some of my input parameters. I don't know how they are related, but I have successfully trained a model with good results. The parameter settings I revised were: 

  • Learning rate: blank
  • Parallel Processing: blank
  • Processing extent: default or set as my land cover extent

Thank you -

0 Kudos
梁飞王
New Contributor

@Anonymous User 

Thanks for your replay, here is the information

-I have tried kinds of models, mainly  FeatureClassifier\MaskRCNN, they all get nan when training model

-latest i tried : the images = 4885 *3*256*256

-I don't understand what is "noData", I made training data step by step on Arcgis Pro, label, and export. if there is noData in my data, I haven't learned to code so I can't eliminate them,

if there is noData in my data, does that means that my image is not intact or integral, that means my image exist some holes, but I get the training sample data by extracting from NDVI instead of draw.

Sometimes 'nan' may change to'0.0'  by adjusting Batch Size or Learning Rate, sometimes not, like this.

_0-1628869019537.png

 

 

I am afraid I haven't made myself clear with my broken English. here is my training data exported from Arcgis pro.

https://1drv.ms/u/s!AuqSdRqzEo83uHlAIOsgSsMGTiBf?e=SrF8X7 

Thanks,

王梁飞

0 Kudos
by Anonymous User
Not applicable

Hi @梁飞王 ,

 

I am not able to access the sample data you uploaded.

 

Thanks,
Sandeep

0 Kudos
梁飞王
New Contributor

yes,@Anonymous User 

https://1drv.ms/u/s!AuqSdRqzEo83uHmo6JBfml1Zu2FJ?e=7fLk6b

thanks, 

王梁飞

0 Kudos
by Anonymous User
Not applicable

Hi @梁飞王 ,

 

I tried to the visualize the training data you shared it looks like this 

SandeepKumar1_0-1629201493172.png

 

there is only one class in the training data 

SandeepKumar1_1-1629201520544.png

 

Can you share a screenshot of your original data, are you trying to extract polygons of green spaces ?

The format of training data you have exported is not correct. You need to export the training data again in 'RCNN Masks' format. A similar workflow is documented here . However you can continue using the 'train deep learning model' tool, instead of using API as documented in this sample.

 

 

Thanks,

Sandeep

0 Kudos