All our hosted feature services broken this morning

3100
17
Jump to solution
07-19-2024 07:02 AM
Matt-Goodman
Frequent Contributor

Global news outlets this morning are reporting massive interruption to the airline industry, banking, etc. due to a bug/failure of Microsoft's CrowdStrike software. Global tech outage disrupts industries, highlights online risks | Reuters

Coincidentally, none of our hosted feature services in Enterprise Portal for ArcGIS are working or available. Their data will not load, they cannot be added to a map, they cannot be overwritten, they cannot be accessed, except to the item 'details' page in Portal. 

Are these two things related? I'm not sure. What would cause just the hosted feature services to break, while everything else in our Portal environment seems to work fine?

 

0 Kudos
17 Replies
AprilChipman
Frequent Contributor

We are getting the same error message. ESRI Tech Support said we needed to restart the machine and then restart the windows services in a particular order. No luck yet.

0 Kudos
MaryEllenRosebrough-Gay
Occasional Contributor

Anyone else fixed this issue?

I have two separate Enterprise environments that are both still down after the Crowdstike fix was applied.Things I have done so far:

  • I read Matt's solution and double checked that none of my servers are still in Safe Mode.
  • Restarted all of my servers
  • Manually stopped and restarted all of my services
  • Applied patches (Check for ArcGIS Enterprise Updates)
  • I called Tech Support a couple of hours ago, but no call back yet.
  • Ran describedatastore.bat - no issues

I can't even login to portal admin or portal at all in either environment. I also can't login to server manager in either environment, so it seems to be a portal issue since both environments are federated. I can log in to server admin.

0 Kudos
AprilChipman
Frequent Contributor

Well, I'm not sure what fixed it, but our system is working again. Here are the steps I was sent by ESRI:

1) restart machine
2) stop/start the Windows service for Portal for ArcGIS
3) stop/start the Windows service for ArcGIS DataStore
4) wait 10 minutes or until Portal is running correctly
5) stop/start the Windows service for ArcGIS Server
6) validate DataStore from Server Manager
7) validate the federation from Portal

I didn't do step 1 since I had already done it today. I had also already tried restarting the Windows services, but maybe I just didn't start and stop the Windows services in the right order?

Or, maybe something else was magically fixed elsewhere and nothing I did resolved the issue, whatever the issue was?

I dunno, but it's working. I may call it a day and start my weekend early...

0 Kudos
SimonGIS
Regular Contributor

Have applied the workaround from Crowdstrike/AWS to bring our environment back online.  However, the ArcGIS Enterprise environment is not starting correctly.   We get this in the logs:

"

The portal has been initialized and configured but is not accessible. The internal portal database does not appear to be running or accepting connections. Restart the portal machine or machines and if the problem persists, contact Esri technical support (U.S.) or your distributor (customers outside the U.S.)."

 "Failed to start the portal. The observer's beforeStart() function returned a failure."


Will be troubleshooting in more detail tomorrow and likely returning to a prior snapshot as suspect the BSOD would have been what corrupted things. But if anyone else has seen similar issues and knows of a workaround, please share.  

0 Kudos
MattReynolds
New Contributor

For me the problem turned out to have two components.

First there was a corrupted config file for the tomcat server in DataStore:  C:\Program Files\ArcGIS\DataStore\framework\runtime\tomcat\conf\server.xml ESRI Tech support helped track this down. This prevented the embedded postgres from starting. This is a generic server conf file, you can grab a clean one from a fresh install or another datastore machine of the same version without Crowdstrike installed (what I did). 

And second, the Crowdstrike “C-00000291*.sys” files needed to be deleted. Once I deleted that and restarted, port 2443 opened up. In Azure we had to take a snashot of the datastore's OS drive, make a disk from it, and mout it to antoher VM. Then delete the “C-00000291*.sys” files from it. Then detach the modified OS disk, and swap it back into the failed datastore VM.

My VM backup from 7/18 was corrupted (probably by Crowdstrike), so unless we fixed it, we would lose a day of fieldwork and editing. 

I hope this helps you out. We all had a rough day yesterday.

-Matt

LeeIrminger2
Emerging Contributor

Great post, thank you! Since Crowdstrike had already updated the “C-00000291*.sys” file - timestamp of 7:59 AM UTC - it did not have to be deleted after replacing the server.xml file.  Falcon Content Update Remediation and Guidance Hub | CrowdStrike

0 Kudos
LeeIrminger3
New Contributor

Hi Matt, 

Have you had issues reaching the ArcGIS Data Store config page over 2443 after using that fix? 

0 Kudos
JonEmch
Esri Regular Contributor

Hello folks! Adding my voice to say that this is a priority issue for Esri to revolve. If you find yourself in a position where services are not starting properly please log a tech support ticket. We are able to assist to get you back up and running.

Keep on keeping on!
0 Kudos