After CrowdStrike Incident, Portals Won't Restart

1309
8
Jump to solution
07-20-2024 04:09 PM
KadeSmith
Frequent Contributor

I have two enterprise portal machines that were affected by the CrowdStrike incident. I have both machines back up and running, but when attempting to open either portal URL (site.com/portal), I get the message "Could not access any ArcGIS Enterprise portal machines. Please contact your system administrator."

I was hoping to restart the Portal for ArcGIS service, but after clicking stop it continually reads, "Stopping" and never actually stops. Anyone else see similar behavior and get it to come back up? If yes, what did you do?

0 Kudos
1 Solution

Accepted Solutions
JonEmch
Esri Regular Contributor

Hello folks! Adding my voice to say that this is a priority issue for Esri to revolve. If you find yourself in a position where services are not starting properly please log a tech support ticket. We are able to assist to get you back up and running.

Keep on keeping on!

View solution in original post

8 Replies
DEWright_CA
Frequent Contributor

When you go to your ArcGIS Portal Logs; what are you seeing there? The TXT files in your arcgisportal folder? Do you see messages about INDEX Service restarting, and HA?

0 Kudos
KadeSmith
Frequent Contributor

Two lines of type SEVERE:

1. Error before starting configuration observer. null

2. Failed to start the portal. The observer's beforeStart() function returned a failure.

0 Kudos
DEWright_CA
Frequent Contributor

Are you in a HA Configuration? My first step for recovery is to stop one machines "Portal for ArcGIS" service; then do a full reboot on the other machine so that it will reboot and not see the other node and try to become a secondary. Usually I will wait 20-30min checking the logs every few minutes to see if the services are appearing to start; seeing the  "Web server was stopped, starting now" and "Index Server was stopped restarting" and "Rebuilding Index" messages. Once these have all cleared and the machine is showing accessible and online in the PortalAdmin interface; I will restart fully the second machine letting it detect the active node and trying to resync it's services and state.

0 Kudos
ar_tw
by
Emerging Contributor

Doesn't sound the same - but we lost Portal also. All other servers were fine.

At a minimum, we found that 

C:\Program Files\ArcGIS\Portal\framework\runtime\tomcat\conf\server.xml

and

D:\arcgisportal\content\items\portal\properties.json
were both corrupted (no content)

Recovering these files still wasn't enough to revive Portal so we had to revert to a VM backup.

Waiting on Esri support to see what damage this might have done to our hosted services which currently aren't running. Luckily most of our content isn't hosted - and they are working fine.

0 Kudos
RobertMuzzy
New Contributor

We are experiencing the same issue. with the following error messages in the logs

  • Error before starting configuration observer. null
  • Failed to start the portal. The observer's beforeStart() function returned a failure.

When we tried to restore to another snapshot from a previous day and it did not solve the issue. 

We currently have a ticket open with ESRI who says they are in the information gathering phase. And we should have a call with them at some point today. 

 

When looking at those two json files mentioned above ours were not corrupted or empty.

 

 

0 Kudos
RobertMuzzy
New Contributor

Just wanted to follow up after things stabilized for a few days with what worked for us. We opened a ticket and heard back from ESRI on Monday Afternoon. ESRI recommended that we restore the entire production environment to before the CrowdStrike issue due to the fact that

"typically there are timestamp errors that cause some disconnects between parts of the Enterprise deployment if all machines are not backed up to the same snapshot". (that was ESRI Supports reasoning)

However, we tried to just restore the entire Portal Server to a previous day before the CrowdStrike issue and this worked for us.  But as stated from ESRI Support this is not the case for other customers. 

Lets hope this helps others and Good Luck

 

 

0 Kudos
GISFunctionalGroup
Occasional Contributor

Hi ,

We've faced the same issue after the Crowdstrike in the ArcGIS enterprise portal app. we have tried all the options to solve the issue. At finally , as a workaround ( not really workaround) we have restore the 10 days previous snapshot backup & it functioning correctly .

We raised the request with ESRI what's the exact root cause for the issue!!

0 Kudos
JonEmch
Esri Regular Contributor

Hello folks! Adding my voice to say that this is a priority issue for Esri to revolve. If you find yourself in a position where services are not starting properly please log a tech support ticket. We are able to assist to get you back up and running.

Keep on keeping on!