Which release are you on? Pro 2.7 enables you to work with sparse training samples and more than 3 bands (more than 3 bands was not supported before that). Next - here are some pointers for your workflows.
1. Capture training samples that are representative of your regions of interest. eg:
2. Ensure the output image format for your 'Export Training Data for Deep Learning' GP tool is Tiff. your metadata format should be classified tiles
3. When filling in the parameters for the 'Train Deep Learning Model' tool , set your model type to u-net. Ensure you set ignore_classes = 0 ( in the model arguments section)
4. Lastly - run the pixel classification tools and you should ideally get the results you need. you can go through the process and increase the number of samples if needed.
If things still dont work - can you provide this information:
- how many bands does your input data have?
- what is the bit depth of your input data?
- Are you using CPU or GPU for computing?
- Which software release are you running this on?
- did you try other models (DeepLab for instance)