How use all sample data for just training and not validation?

2098
9
04-12-2021 04:54 PM
MaryamBarzegar
New Contributor III

Hello, is there a way to use all sample data for just training and not validation? Val_split_pct parameter which shows the percentage of training data to keep as validation, doesn't accept 0 value. 

0 Kudos
9 Replies
Tim_McGinnes
Occasional Contributor III

It looks like this works when using a jupyter notebook. Setting the val_split_pct to 0.0 doesn't give any errors and trains the model ok. As expected, the validation loss cannot be calculated.

However when using the Train Deep Learning Model tool in Pro, it seems to use some of the training data for validation despite putting in zero as the split. So I think it may default back to 10% - but I am not sure.

The above is true for my MaskRCNN model, but may be different for other models. What model are you using and is it giving you any errors when trying to do a zero percent split?

0 Kudos
MaryamBarzegar
New Contributor III

Hi Tim, I'm using ChangeDetector model and it gives me the below error:

"ename": "IndexError",
"evalue": "index 0 is out of bounds for axis 0 with size 0",

data = prepare_data(output_path,
chip_size=256,
val_split_pct=0.0,
dataset_type='ChangeDetection',
batch_size=4
)

0 Kudos
Tim_McGinnes
Occasional Contributor III

Is it the prepare_data step that gives the error? It may be an issue within the training data itself?

I have run a SingleShotDetector with 0% validation split and it works ok too. Note: the show_results and average_precision functions won't work and will give index errors. When you do the save function you will have to pass a compute_metrics=False parameter to save the model or it will give an index error also.

0 Kudos
MaryamBarzegar
New Contributor III

Yes, the prepare_data gives the error. This model is a bit different from other models I mean in the training dataset I have 3 different folders, (images before, images_after, labels); however, for instance, in case of Multitaskroadextractor model I had 2 folders (images and labels). I don't think my training dataset has any problem since it works with other values of val_split_pct parameter. The whole error message:

 

IndexError                                Traceback (most recent call last)
In  [1]:
Line 7:     batch_size=4

File C:\Users\barzegarm\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone1\lib\site-packages\arcgis\learn\_data.py, in prepare_data:
Line 1368:  **kwargs)

File C:\Users\barzegarm\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone1\lib\site-packages\arcgis\learn\_utils\change_detection_data.py, in prepare_change_detection_data:
Line 695:   imagery_type=imagery_type

File C:\Users\barzegarm\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone1\lib\site-packages\arcgis\learn\_utils\change_detection_data.py, in create_train_val_sets:
Line 610:   imagery_type=imagery_type

File C:\Users\barzegarm\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone1\lib\site-packages\arcgis\learn\_utils\change_detection_data.py, in __init__:
Line 322:   self.n_c = self.x[0].data.shape[0]

File C:\Users\barzegarm\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone1\lib\site-packages\fastai\data_block.py, in __getitem__:
Line 120:   if isinstance(idxs, Integral): return self.get(idxs)

File C:\Users\barzegarm\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone1\lib\site-packages\fastai\vision\data.py, in get:
Line 270:   fn = super().get(i)

File C:\Users\barzegarm\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone1\lib\site-packages\fastai\data_block.py, in get:
Line 75:    return self.items[i]

IndexError: index 0 is out of bounds for axis 0 with size 0
---------------------------------------------------------------------------

 

0 Kudos
MaryamBarzegar
New Contributor III

I tried MultiTaskRoadExtractor model and it doesn't give any error so, the problem is just ChangeDetector model

0 Kudos
Tim_McGinnes
Occasional Contributor III

Apologies - somehow I read ObjectDetection instead of ChangeDetection. Yes, I think there is something in the underlying code which is breaking when trying this.

I can't find it documented anywhere, but there appears to be a split_type parameter to choose if your training\validation split is random or defined by folder.

val_split_pct (float): percentage of data to split in validation if the split_type is "random"

split_type (str, optional): If split_type='manual' will use train val folders. Defaults to 'random'.

And the code shows how it should be structured:

if split_type == 'folder':
    if (path / 'train').exists() and (path / 'val').exists():
        folder_check(path / 'train')
        folder_check(path / 'val')
        train_images_before = get_files(path / 'train' / 'images_before', extensions=image_extensions)
        train_images_after = get_files(path / 'train' / 'images_after', extensions=image_extensions)
        train_labels = get_files(path / 'train' / 'labels', extensions=image_extensions)
        val_images_before = get_files(path / 'val' / 'images_before', extensions=image_extensions)
        val_images_after = get_files(path / 'val' / 'images_after', extensions=image_extensions)
        val_labels = get_files(path / 'val' / 'labels', extensions=image_extensions)

You could maybe try putting your existing image\label folders under a 'train' folder and creating some empty image\label folders under the 'val' folder. In the prepare_data function add a parameter split_type='manual' and see if that makes any difference (it could also be split='manual', I'm not sure)?

0 Kudos
MaryamBarzegar
New Contributor III

I think they only defined the split_type parameter in the source code of Change Detector model but it can't be defined in prepare_data. I created 2 folders as train and val and in each folder I created 3 folders as images_before, images_after and labels. Then I tried split_type = 'manual' and split_type = 'folder' and both didn't work:

Exception                                 Traceback (most recent call last)
In  [1]:
Line 7:     batch_size=4

File C:\Users\barzegarm\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone1\lib\site-packages\arcgis\learn\_data.py, in prepare_data:
Line 902:   folder_check(path)

File C:\Users\barzegarm\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone1\lib\site-packages\arcgis\learn\_utils\change_detection_data.py, in folder_check:
Line 472:   raise Exception(f"Three folders must be present in the {path.name}"

Exception: Three folders must be present in the Training13_novalidation directory namely 'images_before', 'images_after' and 'labels'.

 There is no parameter as split_type for prepare_data:

MaryamBarzegar_0-1618364415294.png

MaryamBarzegar_2-1618364636558.png

 

0 Kudos
Tim_McGinnes
Occasional Contributor III

Yes, I think we are at a dead end here - from reviewing the code it looks like the only method that works is the single set of folders (probably why the split_type parameter is not documented anywhere).

For ChangeDetection I think you will just have to set the val_split_pct parameter to the lowest value you can without it giving an error. For example, using the change detection sample notebook, the supplied data has 215 images. I set val_split_pct=0.005 (which gives 1 validation image) and the training process worked ok.

0 Kudos
MaryamBarzegar
New Contributor III

Thank you Tim. Yeah, I did the same and I defined val_split_pct= 0.01 but it would be helpful if we could control this and selecting images for validation wasn't random. 

0 Kudos