Im using ArcGIS Pro 2.9. Have extracted video frames from a multiplexed video to a mosaic dataset and using it as an Image Collection under the Image Classification wizard to create labels on the different frames.
When I export the labels as training data, is it randomly choosing labeled objects from the different frames to create image chips? Or are image chips created from all labeled objects across all extracted frames?
Lastly, when applying the trained model to detect objects and given video frames and moving objects are overlapping, is processing as individual rasters/frames with non maximum suppression better than processing as a mosaicked image?