We have 2 GPUs on a windows machine running ArcGIS 2.8.3 and have tried distributed GPU training as described here:
I have not managed to get it to work and I am wondering if this is because it only works in Linux.
The above web page contains several inconsistencies, such as referring to windows path with:
The issue I am getting is that only one GPU starts processing and then at the end of the 1st epoch crashes (excepts from crash below)
File "C:\ArcGIS\Pro_\bin\Python\envs\arcgispro-py3\lib\site-packages\fastai\callback.py", line 347, in on_batch_end
AttributeError: module 'torch.distributed' has no attribute 'all_reduce'
AttributeError: module 'torch.distributed' has no attribute 'barrier'
Has anyone managed to get this going in Windows?
Solved! Go to Solution.
ArcGIS API for Python version 1.9.1 has been released. If you are an anaconda user you can get that along with all the deep learning dependencies using this command in a clean environment.
conda install -c esri arcgis_learn =1.9.1 python=3.8
I create a blank environment using "conda create --name dl13" and as can be seen but got the below issues trying to install arcgis_learn and python.
(dl13) C:\ArcGIS\Pro_\bin\Python\envs\arcgispro-py3>conda install -c esri arcgis_learn=1.9.1 python=3.8 Collecting package metadata (current_repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: / Found conflicts! Looking for incompatible packages. This can take several minutes. Press CTRL-C to abort. failed UnsatisfiableError: The following specifications were found to be incompatible with each other: Output in format: Requested package -> Available versions Package python conflicts for: arcgis_learn=1.9.1 -> python[version='>=3.6,<3.7.0a0|>=3.8,<3.9.0a0|>=3.7,<3.8.0a0'] arcgis_learn=1.9.1 -> boost=1.73 -> python[version='2.7.*|3.5.*|3.6.*|>=2.7,<2.8.0a0|>=3.7|>=3|>=3.6|>=3.9,<3.10.0a0|3.4.*|>=3.9,<3.10|>=3.8,<3.9|>=3.7,<3.8|>=3.6,<3.7'] python=3.8
I also tried to clone arcgispro-py3 and install the arcgis_learn and python - but got pages of conflicts. Such as.
Package keras-gpu conflicts for: esri/win-64::deep-learning-essentials==2.8=arcgispro_4 -> keras-gpu=2.3 defaults/win-64::keras-gpu==2.3.1=0 defaults|defaults/win-64::keras-gpu==2.3.1=0 Package swat conflicts for: esri|esri/win-64::swat==1.8.1=py37_0 esri/win-64::arcpy==2.8=py37_arcgispro_29734 -> swat esri/win-64::swat==1.8.1=py37_0
From the documentation available here https://developers.arcgis.com/python/guide/install-and-set-up/#Install-using-Python-Command-Prompt-o...
arcgis_learn is a metapackage designed for standalone anaconda environments. You can install it in vanilla anaconda environments, not in ArcGIS Pro conda envrionments.
For ArcGIS Pro I would recommend you to use the deep learning installer, but this multi GPU support will only be available in ArcGIS Pro 2.9 once it is released.
For now you can use a standalone anaconda environment and install arcgis_learn in it.