Test ArcGIS Online Updates before rolling them out

6222
39
06-26-2025 06:58 AM
Status: Open
RichardHowe
Frequent Contributor

Literally every time ESRI pushes out an ArcGIS Online update does it cost us a day of work while they fix whatever issues the latest update has introduced. It's beyond a joke now and is incredibly hard to defend to colleagues and clients. The stress testing of these updates needs to be properly done before pushing it out instead of beta testing them on paying customers

Tags (3)
39 Comments
ZachBodenner

@RTPL_AU deployment patterns can be so unique. I don't think it's totally fair to say that ESRI tech support suggesting that any of those things could be issues equates to gaslighting. It does seem like they knew it was a bad upgrade pretty quick. 

I definitely hear and agree with you on the mitigation strategy though. Other flagship products of theirs get patch releases, seems like it might be something that could be included in AGOL as well.

SaraJL
by

@ZachBodenner I also agree with you on that - there were notifications that were eventually sent out, but I didn't start receiving the ESRI Incident emails until I had contacted Tech Support. The incident notifications should have gone out to all admins when the performance issues started. And I'm not talking about the regular Tech Support emails, I'm talking about the actual emails from the ESRI Incident Management Team.

Before I reach out to Tech Support, I usually troubleshoot everything I know first (is it a browser issue, do other people have the same problem, does it show up on both Pro and Online, it's not user error, etc) and it would save me a lot of time if I just had a notification to begin with. Then I wouldn't have to bother Tech Support and waste a bunch of time trying to troubleshoot something that I can't control - I would just wait for the incident notification updates and work on some other projects for a while.

It would also give me the ability to give my own users a heads up, so I don't have to deal with IT Helpdesk tickets lol

More or less, just "please help me so I can help you" is how I feel about the whole thing whenever this happens.

I also want to re-iterate - this is an issue that is really for their development/deployment team and whoever their testing team is (however ESRI deploys software) and not Tech Support. Tech Support is the middle-man for a lot of this stuff. There was clearly something missed when they deployed the update to the production environment. Which is something that Tech Support wouldn't be able to address for the most part.

ZachBodenner

@SaraJL Oh yeah totally, an email push to the admins of every ESRi account should have happened.

SaraJL
by

Okay - now I'm really legitimately annoyed.

ArcGIS Online won't let me bulk update user roles after our license renewal and now it's kicked me off as an authorized caller even though myself and the CIO are still organization admins. Now I can't even fill out a Tech Support case. Every single time user role changes are made there is always some crazy shenanigans whenever the license renewal date comes up.

These named user licenses make my life 90% easier about 10% SUPER aggrevating. 

NairiSevajian

Hi everyone,  

First, I’m sorry many of you were affected by Thursday morning’s performance incidents. I understand how even a brief interruption can impact your day, and I appreciate you taking the time to tell us what happened.  

For anyone who did not log a case with Technical Support to receive updates, the timeline of the incidents are summarized on the Status page here. 

As it relates to release, ArcGIS Online goes through mixed-method testing, including regression and manual testing, to ensure each update meets our high standards. While no release is entirely without risk, we take extensive precautions to minimize disruptions.  

With that said, I wanted to share a quick reminder about our Ideas Exchange: it’s designed to receive requests for new or improved features and functionality, not for reporting performance issues. Posts about outages or unexpected behavior reach the right people much faster when they go through Technical Support. Your concerns are still very important to us, and we want to hear from you when issues like this arise—albeit in the right channel(s). 

@RichardHowe, I’m going to change the status of this Idea in line with the submission guidelines. If you have more specifics that could help refine this Idea, please feel free to send me a DM and I will be happy to help.  

Thank you all for your patience and feedback. 

RTPL_AU

Hi @NairiSevajian 
From what I experienced, and as commented by our local reseller, the outage and subsequent performance issues were all but brief.

The Status page did not accurately reflect what users were experiencing; should we lodge that as a case or an Idea to improve the transparency & verbosity of said page?


We would hope so. Personally, the lack of clear communication by yourselves and our mandatory regional resellers was a major point of failure. Things fail in complex systems. How you interact with customers at that point is what sets you apart. Making sure it doesn't happen again (this was not the first time this scenario has played out, although not as severe in my recent memory) and being more responsive  with direct communications are core to the critique.

This Idea reflects the belief of at least 60 paying customers that more needs to be done. This post, as far as I understood, was not a report or submission of a bug (backward looking), but the bug was used as justification of an Idea for Esri to improve systems and processes so that we are not affected like this again (forward looking).
If this was not clear from the Idea text, I am sure @RichardHowe would be happy to edit the text to make it more understandable. 

The relentless referral to Support is useless if you are a non-US small customer. We don't have a 24/7 helpline to call over weekends - when this issue affected me. All we can do is to lodge an Idea for you to do better and to communicate with each other in this forum. 

Thank you for understanding how Esri's actions affect real people with real customers/clients/citizens/stakeholders.

RichardHowe

@NairiSevajian Thanks for responding. A few observations below:

1. This is the third ArcGIS Online update in a row that had cost our organisation at least one working day across the board. It is now at the point where I am considering advance warning my entire organisation that an update is coming and to anticipate not being able to use the platform for at least the following day. That can't be right, but is preferable to having field staff out on scheduled and expensive surveys that can't use the software. My original suggestion is not a complaint about service or a note of a bug, it is that these things should be more thoroughly tested before being forced on us to prevent outages happening. I appreciate you cannot test for everything, but the fact that 3 updates in a row have caused major issues suggest to me that the testing isn't robust enough for purpose. I genuinely believe more could be done, and also giving us some kind of rollback option to ensure that we can continue to work.

2. Calling the events of the last week a "brief interruption" is slightly disingenuous. As I previously noted it cost us several days of work. Multiply that by the daily rate of the ~1000 staff that we have actively using our portal then that is a big number!

3. Technical support is great, but as noted by others here the user community is quicker to respond and it;s nice to know it's not just you. I think the suggestion here is a great idea and should be escalated to ESRI inc ASAP, because there's nothing more frustrating than thinking something is wrong (and subsequently finding out it was) but the dashboard that should keep you informed was saying everything was fine the whole time). Of course we also submit a support case when anything like this happens.

ZachBodenner

@NairiSevajian I think we all understand that ESRI does indeed test out AGOL updates before they go live. If you are able, I think it would be nice to share with the community what aspects of the release worked fine in ESRI's testing but failed when they became live. Unlike previous updates that have had shorter time-to-resolve, this one was big and I think it might be hard for a lot of us to understand how something so serious got missed. The Status page has very minimal information about the actual issue.

MichaelMorisette

@NairiSevajian 

Thank you for the response.  However, I feel it missed the mark.  We were very aware of the incident as it happened, and that Esri was working on it.  The purpose of this thread was by no means to report it, but to both pressure and help Esri to do better going forward.

To call the Jun 26 incident "minor disruption" or "performance issues" is an understatement.  Esri botched the update, and ArcGIS online was straight up down.  Even as we came back online, there have been lingering issues caused by the update that we've spent hours with technical support on.  I feel Esri needs to take more ownership and initiative on issues they cause.

Some general (hopefully helpful) takeaways I have both from this thread as well as our organization's experiences over the last several years.

  • When a major system outage is noticed, please do communicate the fact with us
    • Send a mass email out, or at least update the AGOL status dashboard to be something more descriptive of the issue at hand
  • "go Through Technical Support" is not a working solution when the technical support doesn't get the information to the right people in a timely fashion (product development, AGOL operations, etc.)
    • This wasn't always the case, but has been gradually getting worse over the years
    • There have been countless instances where we call support, spend hours and even days of our time with busy work they give us, just to find out the problem was on Esri's end all along and that they might fix it with the next full release if they feel like it
  • Work internally to improve the overall quality of the product, and convey to us you are doing so
    • For example, CentralSquare (another enterprise suite we use) routinely report metrics on how they are doing this at their conference plenary sessions
    • The number of support cases submitted in the first place, not just their time to resolution, is a good KPI for this from what I've seen
      • Less technical support cases opened meaning the product is getting the bugs ironed out as well as becoming more intuitive