The kernel appears to have died. It will restart automatically.

8544
30
08-26-2021 02:22 AM
MaxBöcke
New Contributor III

Hi guys!

In executing an ArcGIS Notebook Runtime Advanced we always getting the error message: "The kernel appears to have died. It will restart automatically." when we want to "from arcgis.gis import GIS".

It is confusing because when we try it with an Standard Runtime, it works properly. But in our case we need both python packages. arcgis and arcpy. The "import arcpy" command works properly in the Advanced Image.

Here are the details:

System: ArcGIS Enterprise 10.9 with ArcGIS Notebook Server 10.9 on a RedHat 7 VM

What we have done in reverse engineering:

1. We connected to the container directly and execute the command (from arcgis.gis import GIS)

It's working. So we conclude from this there is no problem with container or the virtual machine. 

2. We analyzed the log files of the WebSocket Connection, the container logs. There is only logged "The kernel has restarted". We find no entry - why the kernel has restarted...

3. We contacted Esri Support. Unfortunately they can't reproduce it.

And the most confusing matter is that all import commands are fine with Advanced and Standard Image 4.0 It occurs with Advanced Image 5.0. From here we need python 3.7.9 package for our use case.

Perhaps I'll find some further help here. Thank's for reading my issue.

Best regards

Max (max.boecke@hereon.de)

 

 

 

30 Replies
MaxBöcke
New Contributor III

Hi Shikhar

sorry for my late response. I am pretty busy.

Here are the results:

1. Could you lookup 'docker commit' to create an image out of a running container and share the container snapshot with us?

Yes, of course. Could you send me an email adress. After that I'll provide you a download-link. The tar.gz file has nearly 5GB.

2. Please try printing LD_LIBRARY_PATH. (Either "echo $LD_LIBRARY_PATH" on the shell or os.environ["LD_LIBRARY_PATH"] in Python prompt)

 

 

'/home/arcgis/arcgis/server/framework/runtime/xvfb/Xvfb/lib64:/home/arcgis/arcgis/server/bin/wine/lib64:/home/arcgis/arcgis/server/bin/wine/lib64/wine:/home/arcgis/arcgis/server/framework/runtime/tomcat/bin:/home/arcgis/arcgis/server/lib:'

 

 

 

3. Did you try it with a clean/new docker image?

Do you mean a clear downloaded Advanced Image form myesri? If yes, we set up a complete dedicated machine with ArcGIS Notebook Server with new docker images from esri. The have noticed the same behaviour on that (physical )machine. Currently we have a virtual machine in production. 

If no, what do you mean exactly?

4. Also please share info. about any other installed packages.

See attached file - installed_packages.txt

 

 

What i would like to emphesize is that we had similar issues in 10.8.1. But if we have changed the order of the import packages it has basically worked in 10.8.1. After upgrade to 10.9 this workaround no longer works.

So I agree with you that something works incompatibly wrong with an already installed package on the machine.

 

Thank you very much and best regards

Max (max.boecke@hereon.de)

0 Kudos
MaxBöcke
New Contributor III

Hi Shikhar,

 

sorry for the late response: Here are the answers:

1. Could you lookup 'docker commit' to create an image out of a running container and share the container snapshot with us?

Yes, of course. Could you provide an email address. After that I'll provide a download link to that address.


2. Please try printing LD_LIBRARY_PATH. (Either "echo $LD_LIBRARY_PATH" on the shell or os.environ["LD_LIBRARY_PATH"] in Python prompt)

'/home/arcgis/arcgis/server/framework/runtime/xvfb/Xvfb/lib64:/home/arcgis/arcgis/server/bin/wine/lib64:/home/arcgis/arcgis/server/bin/wine/lib64/wine:/home/arcgis/arcgis/server/framework/runtime/tomcat/bin:/home/arcgis/arcgis/server/lib:'


3. Did you try it with a clean/new docker image?

Do you mean a clear downloaded Advanced Image form myesri? If yes, we set up a complete dedicated machine with ArcGIS Notebook Server with new docker images from esri. The have noticed the same behaviour on that (physical )machine. Currently we have a virtual machine in production. 

If no, what do you mean exactly?


4. Also please share info. about any other installed packages.

- see attached file "installed_packages.txt"

Thank you very much for your help. 

Max

0 Kudos
MaxBöcke
New Contributor III

Hi

something happened to this case? Thanks a lot. 

 

0 Kudos
MaxBöcke
New Contributor III

Here is a link which describes exactly the same Log Message of this case but in another use case.

https://github.com/ContinuumIO/docker-images/issues/171 

0 Kudos
shikhar_deep
Esri Contributor

Hi @MaxBöcke ,

 

Apologies for being late. 

 

1. For "docker commit', if you could share it privately via message would be helpful.

2. Could you also share the resources available of the system, disk space for docker, webSocketSize and Runtime Details (Version, Max memory & Shared Memory).

3. Ensure if the log level is set to 'DEBUG'. If not set it to 'DEBUG' and terminate the container and try reproducing the issue again and share the log details.

4. While reproducing the issue run the 'docker stats'. And observe if the 'Memory Usage' is not exceeding the 'Limit' in 'docker stats' details.

5. When this issue is being reproduced, are there any other notebooks (interactive/scheduled) too running or its the only one? 

6. And how long does it take to run into this issue?

 

Thanks & Regards

Shikhar Deep

 
0 Kudos
MaxBöcke
New Contributor III

Hi @shikhar_deep

we stopped anlyzing this issue. In Version 10.7 and 10.8 we had successful tests. Not in 10.9. We guess there is an issue within the 10.9 advanced image. For example we imported the 10.9.1 Advanced Image as a custom image. The import functionality works apparently. But afterwards we had a lot of crashes by using arcpy. 

The ressources for executing such notebooks a highly sufficient. 8 Cores, 32GB RAM. The analyzed platform performance is great.

I sent the 'docker commit' image link to you via private message.

Thanks a lot for your efforts. 

Best regards

Max

0 Kudos
shikhar_deep
Esri Contributor

Hi @MaxBöcke ,


Glad to know about  "import" working as expected in 10.9.1. I am curious if you still having crashes in 10.9.1 or in 10.9? If in 10.9.1, further more info would be helpful.

 

Thanks & Regards

Shikhar Deep

0 Kudos
MartinKrál
Esri Contributor

Hi @MaxBöcke 

I think there is maybe some link to version of tensorflow library and CPU AVX support. Our customer has same problem after migration to ArcGIS Notebook 10.9.1. We tried to run "from arcgis.gis import GIS"  in python console inside their docker container and obtained error message"Illegal instruction (core dump)" and exactly same behavior  about the "Kernel ... died / Illegal instruction " with "import tensorflow". I was unable to simulate this problem on my machine, but recently I 'v tried another older machine with CPU that has not support for AVX instruction, and same problem appeared.  Just for test I tried in test docker container to uninstall tensorflow-gpu with conda, and "Kernel appears to have died.." was gone after arcgis.gis import.  Could you please run   !more /proc/cpuinfo | grep flags  in your problematic notebook  to check if there is AVX listed?  It is somehow similar to this problem.  I do not know how tensorflow is connected ArcGIS API for python, but  I will try to open support case with Esri for this problem, and I can let you know the result.

Regards

MartinKrál
Esri Contributor

FYI:test on docker container without CPU AVX support

MaxBöcke
New Contributor III

Hi @MartinKrál 

you made my day in this case. Your mentioned command "!more /proc/cpuinfo | grep flags" responsed that we have a AVX supported CPU. 

The followoing works for us:

 

import tensorflow
import arcpy
from arcgis.gis import GIS
gis = GIS("home")

 

Already in 10.8.1 we noticed there is a correlation in form of the order of the import packages. If you import a specific package first you can import another package successfully. If not, the kernel dies. If we omit "import tensorflow" the kernel dies.... I guess the above approach is a good workaround to live with. The uninstall tensorflow approach I'll keep in my mind.

Thanks a lot.

Best regards

Max

0 Kudos