why did Arcgis.com go down

1681
18
Jump to solution
03-20-2018 09:15 AM
KevinMacLeod4
Frequent Contributor

Last year it was down almost a day. No failover for AWS. What happened this time??  It just came back for us, down about 11:15 - 12:15 EST.

1 Solution

Accepted Solutions
MarkBockenhauer
Esri Regular Contributor

Kevin,

This has been posted in a few places...   copying and pasting here.  (I received it in an e-mail).

-----------------------------------------begin message---------------------------------------------

Thank you for your patience on Tuesday as we worked through the ArcGIS Online disruptions in providing access to your services, maps and apps.  We understand the importance of continuing to provide a resilient, redundant and well architected system and we are confident that everything is back to normal and no data was lost.

 

What happened :

Between 9:00AM and 12:15 PM EST Tuesday March 20, 2018  ArcGIS Online experienced periods of disruption. 

 

Why did it happen :

The ArcGIS Online system that runs on web servers hosted within AWS experienced timeouts and failures in accessing AWS Services including  S3. These failures were the result of  S3 network connectivity errors reported and described by AWS on its status page.  These connectivity issues in AWS affected ArcGIS Online availability even though the ArcGIS Online system runs on redundant web servers and across multiple data centers.

 

Our Response:

The ArcGIS Online operations team worked with  Amazon during the event to diagnose the problem and then began to work on the reconfigurations needed for the web servers to access the AWS services via alternate network routes. In the meantime the original network connectivity problem was resolved by AWS.

Based on this incident we are working on an automated approach to  detect network connectivity failures and to reroute network connections to the relevant AWS services where applicable.

 

Additional things we will be working on:

- ArcGIS Online is already taking advantage of redundancy and fail over across data centers. We will investigate and consider additional improvements including network endpoint failover where applicable as well as other mitigation strategies. 

 - We will continue working with AWS on further details of the root cause of the incident, and will examine engineering and operational improvements

----------------end of message--------------------------------------

View solution in original post

18 Replies
MarkBockenhauer
Esri Regular Contributor

This site has some info  ArcGIS Online Health Dashboard 

DeborahHuber1
New Contributor III

Amazon Web Service has been posting updates on this issue throughout the morning...their N. Virginia sector went wonky which affected a lot of the east coast. AGOL runs thru AWS.  AGOL appears to be working *mostly* ok now, but I'm still having a hard time with it creating correct GlobalID strings when I export a FGDB. Amazon Web Service Health Dashboard:  https://status.aws.amazon.com/  ESRI Health Dashboard: https://doc.arcgis.com/en/trust/system-status/ 

KevinMacLeod4
Frequent Contributor

Mark, thank you, indeed I was watching that like a hawk!  My question is, do they plan on creating a secondary failover. Or were only eastern customers affected? Can they set it up to route traffic to the west instance redundantly?

MarkBockenhauer
Esri Regular Contributor

Kevin,

This has been posted in a few places...   copying and pasting here.  (I received it in an e-mail).

-----------------------------------------begin message---------------------------------------------

Thank you for your patience on Tuesday as we worked through the ArcGIS Online disruptions in providing access to your services, maps and apps.  We understand the importance of continuing to provide a resilient, redundant and well architected system and we are confident that everything is back to normal and no data was lost.

 

What happened :

Between 9:00AM and 12:15 PM EST Tuesday March 20, 2018  ArcGIS Online experienced periods of disruption. 

 

Why did it happen :

The ArcGIS Online system that runs on web servers hosted within AWS experienced timeouts and failures in accessing AWS Services including  S3. These failures were the result of  S3 network connectivity errors reported and described by AWS on its status page.  These connectivity issues in AWS affected ArcGIS Online availability even though the ArcGIS Online system runs on redundant web servers and across multiple data centers.

 

Our Response:

The ArcGIS Online operations team worked with  Amazon during the event to diagnose the problem and then began to work on the reconfigurations needed for the web servers to access the AWS services via alternate network routes. In the meantime the original network connectivity problem was resolved by AWS.

Based on this incident we are working on an automated approach to  detect network connectivity failures and to reroute network connections to the relevant AWS services where applicable.

 

Additional things we will be working on:

- ArcGIS Online is already taking advantage of redundancy and fail over across data centers. We will investigate and consider additional improvements including network endpoint failover where applicable as well as other mitigation strategies. 

 - We will continue working with AWS on further details of the root cause of the incident, and will examine engineering and operational improvements

----------------end of message--------------------------------------

View solution in original post

KevinMacLeod4
Frequent Contributor

Thank you for the detailed info Mark. Half-serious here, but maybe Azure might be in the running as a backup!   

DeborahHuber1
New Contributor III

Mark, I am still having trouble with photo attachments not working properly. My field team accidentally collected some photos sideways in the web app. I extracted the photos, turned them, saved as same name and re-attached via the web app in the office. 

In the web app, the attachments at first show up as "ATT36image.jpg" as they normally should, but when I close the edit box and re-open it, the attachment now shows the file path link to the folder the turned photos are stored in.  This is not how AGOL web apps USUALLY work.  While I can reopen the attachment fine in the web app, when I extract and download the FGDB and open it in ArcDesktop, the photo link SHOULD open thru the info window - and it should just have the file name - not the whole path.  Instead, it shows the whole file path - and it does not open when you click on it - even in the attachment manager view. 

Since the FGDB is extracted/downloaded via Amazon web services, I am inclined to believe it is related to yesterday's incident...but all reports from ESRI are stating "Everything is back to normal"...but it's not in this case.  FWIW, I have done this photo-turning maneuver in the past without incident - as recently as last week.  This is a new problem. 

  Status bar showing download of FGDB thru "amazonaws.com"

0 Kudos
KatieCullen
Regular Contributor II

Please contact Support and open a case. If you have any issues getting a case created please let me know.

https://support.esri.com/en/contact-tech-support

DeborahHuber1
New Contributor III

I am not an Administrator for my Organization, so I have forwarded this to one of them to open a Case for me.  The problem still exists today. Hoping for a case and solution soon!

0 Kudos
KevinMacLeod4
Frequent Contributor

By the way, if this is useful as a data point... our Amazon instance kept on trucking the whole time yesterday just fine.  (The site has local JS API.)

It says we are Availability Zone US-East-1d and it is m3-large.

0 Kudos