Upgrade of ArcGIS Server 10.9.1 to 11.2 fails

1430
10
Jump to solution
02-11-2024 05:58 PM
Brian_Wilson
Occasional Contributor III

Suggestion solicited on how to actually make a Server upgrade work 10.9.1 to 11.2

Another failed Esri upgrade. This happens each time I do an upgrade. It's something new each time, keeps me on my toes. I have to work past all the bugs in the installers. Usually several. Usually I can refer to my notes for some of them, because they were not fixed in the intervening years. (Why would Esri fix old bugs when they are busy adding new ones??)

The upgrade of Portal and Portal Web Adaptor worked with only one major bug; "the username is longer than 20 characters". But I worked around that.

I then moved on with Server and it ran fine, finished, and launched the web browser. I can't go further than that because it fails to connect on port 6443. It's supposed to connect at https://localhost:6443/arcgis/manager/

I tried various and sundry URL's including localhost and hostname and various browsers including Firefox, Edge and Chrome. I tried connecting over the network from my desktop. (All behind a firewall and there are no reachability problems)

I checked logs and Event Viewer and I see no errors. The regular logs show nothing -- log messages show up in C:\Program Files\ArcGIS\Server\usr\logs\ instead of C:\arcgisserver\logs\ and I think that's what happens until the upgrade completes?  Here is a boring sample

<Msg time="2024-02-11T16:56:33,616" type="INFO" code="7720" source="Server" process="1224" thread="1" methodName="" machine="CC-GISSERVER.CLATSOP.CO.CLATSOP.OR.US" user="" elapsed="" requestID="">The cloud regions configuration file was deleted from this server machine. If you have other server machines in the site, please make sure that the file has been deleted from them as well.</Msg>
<Msg time="2024-02-11T16:56:49,987" type="INFO" code="9011" source="Rest" process="4420" thread="1" methodName="" machine="CC-GISSERVER.CLATSOP.CO.CLATSOP.OR.US" user="" elapsed="" requestID="">Application initialized.</Msg>
<Msg time="2024-02-11T16:56:49,987" type="INFO" code="9011" source="Rest" process="4420" thread="1" methodName="" machine="CC-GISSERVER.CLATSOP.CO.CLATSOP.OR.US" user="" elapsed="" requestID="">Application initialized.</Msg>
<Msg time="2024-02-11T16:56:50,592" type="INFO" code="30204" source="Admin" process="4420" thread="1" methodName="" machine="CC-GISSERVER.CLATSOP.CO.CLATSOP.OR.US" user="" elapsed="" requestID="">Webhook log: Webhooks app context initializing..</Msg>
<Msg time="2024-02-11T16:56:50,594" type="INFO" code="30204" source="Admin" process="4420" thread="1" methodName="" machine="CC-GISSERVER.CLATSOP.CO.CLATSOP.OR.US" user="" elapsed="" requestID="">Webhook log: Webhook processors initialization in progress...</Msg>
<Msg time="2024-02-11T16:56:50,594" type="INFO" code="30204" source="Admin" process="4420" thread="1" methodName="" machine="CC-GISSERVER.CLATSOP.CO.CLATSOP.OR.US" user="" elapsed="" requestID="">Webhook log: Starting webhook processors....</Msg>
<Msg time="2024-02-11T16:56:52,333" type="INFO" code="7702" source="Server" process="1224" thread="1" methodName="" machine="CC-GISSERVER.CLATSOP.CO.CLATSOP.OR.US" user="" elapsed="" requestID="">Web server started successfully.</Msg>

I found a suggestion regarding forcing the upgrade by going to https://cc-gisserver.clatsop.co.clatsop.or.us:6443/arcgis/admin/ but this is impossible because there is nothing running on port 6443

That article also mentioned the C:\arcgisserver\config-store\version.json -- it still says 10.9.1 which is depressing.

I tried running "repair" and rebooting. I've rebooted many times. I've tried running under a local account and under the AD network account.

I found a file from 2022 called version.rlock next to the version.json file and deleted it and then ran the installer again. The repair option wont run again.

I am thinking the actual upgrade does not happen until the URL is invoked?

I peeked at the Tomcat logs which as always show "potential memory leak" messages but nothing jumps out.

I will now tell my admin that he needs to roll back to the snapshots he made this morning, so that we can function normally rolling back to 10.9.1, having basically wasted another day. A Sunday. Insert more colorful language here.

 

1 Solution

Accepted Solutions
Brian_Wilson
Occasional Contributor III

I had my fabulous IT guy clone the Server VM so I could hack away on it. I was able to get a version of 11.2 installed there, from scratch. It worked fine, but, no data.

Then I went ahead and did the full upgrade this past weekend. Hit the same problem, nothing on port 6443. Knowing it's a Tomcat app, I searched the Tomcat conf files and found that some text was different between the upgraded and installed-from-scratch versions in server.xml

Using 'netstat' in a Powershell I could see there was just no service running on port 6443. The service on 6080 simply redirects to 6443 at which time a browser just returns an error.

ESRI version from upgrade (fails)

<Connector SSLEnabled="true" URIEncoding="ISO-8859-1" clientAuth="false"
 connectionTimeout="60000" connectionUploadTimeout="1000000"
 disableUploadTimeout="false" maxHttpHeaderSize="65535" maxPostSize="10485760"
 maxThreads="150" port="6443"
 protocol="org.apache.coyote.http11.Http11Nio2Protocol" relaxedQueryChars=""
 scheme="https" secure="true" server=" " sslProtocol="TLS"
 useServerCipherSuitesOrder="true"/>

My current version from fresh install

<Connector SSLEnabled="true" URIEncoding="ISO-8859-1"
 ciphers="TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
 TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384, TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
 TLS_DHE_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,
 TLS_DHE_RSA_WITH_AES_256_CBC_SHA, TLS_RSA_WITH_AES_256_GCM_SHA384,
 TLS_RSA_WITH_AES_256_CBC_SHA256, TLS_RSA_WITH_AES_256_CBC_SHA,
 TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
 TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
 TLS_DHE_RSA_WITH_AES_128_CBC_SHA256, TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
 TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_128_CBC_SHA256,
 TLS_RSA_WITH_AES_128_CBC_SHA" clientAuth="false" connectionTimeout="60000"
 connectionUploadTimeout="10000000" disableUploadTimeout="false"
 keyAlias="SelfSignedCertificate"
 keystoreFile="C:\Program Files\ArcGIS\Server\framework\etc\certificates\arcgis.keystore"
 keystorePass="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
 maxHttpHeaderSize="65535" maxPostSize="10485760" maxThreads="150" port="6443"
 protocol="org.apache.coyote.http11.Http11Nio2Protocol" relaxedQueryChars=""
 scheme="https" secure="true" server=" "
 sslEnabledProtocols="TLSv1.2,TLSv1.1,TLSv1" sslProtocol="TLS"
 useServerCipherSuitesOrder="true"/>

I saw references to a keystore file in the working version, so I confirmed the file existed. I edited the server.xml file to replace the lines above and restarted the service and it worked. I was then able to complete the upgrade with only the usual hiccups and flaws.

Once it completed I checked services and maps and all and everything seems to be working so far. It's been quiet today. Huge anticlimax. The only discernible difference so far is the version number.

 

View solution in original post

10 Replies
Scott_Tansley
MVP Regular Contributor

Far from ideal.  One of the things that I do, other than VM snapshots is an Export Site.  In situations like this, I've previously just uninstalled the server.  Cleared all the folders.  Then re-installed at the same version, and created a new site.  Then imported the old site, getting me back to square one.  Then try the install. 

I know that sounds crazy, but I don't always have instant access to the sysadmin at my client sites, so it's just a way of saying "Can't rollback, need to move forward, let's do it".    

It sort of covers all bases.  At this point, however, I'd be following you lead and getting the roll back to try again.  

Scott Tansley
https://www.linkedin.com/in/scotttansley/
0 Kudos
Brian_Wilson
Occasional Contributor III

I expected you to say "nothing ever goes wrong for me" again, Scott. 🙂

I've never done "export site/import site". I found the buttons in Portal Admin. Sounds comparatively fun. Like an unexpected polar plunge versus a weekend spent being waterboarded. I will look at that.

I think that it's not that things don't go wrong for you, it's that you already have the backup chain of strategies worked out and ready to go, and I don't. I have to try everything I can think of, and then have to abandon, set back from snapshots, solicit suggestions and start all over again from the beginning next weekend.

Since we don't have the high-priced support contract, I can't call for help outside business hours even though when I call during business hours I usually talk to someone who is working the night shift from another part of the world-- capitalism at its finest. I think there's instilled in me a sense of duty to keep services online that goes back to when I was working for Internet and phone companies. Otherwise I'd just shut everything off 8am Monday and leave it down as long as it takes. Can't bring myself to do that when there is another way.

I am thinking of testing out the recovery mode you suggest on a different VM to see what it looks like, I've never done that. Should I install from scratch and run the restore to see what happens? At one point I was asking Esri what happens with licensing if I want to try this out. I never got a straight answer.

I try to work on days when my admin is available and fortunately he works 10 hours on Sundays every week. Also he's one of the best Windows admins I have ever worked with. And networking. And he knows Linux too. I can say this because he does not read these messages.

0 Kudos
Scott_Tansley
MVP Regular Contributor

Yeah. It goes wrong more often than I’d like to admit.  I had 12 client upgrades planned back to back test/prod in 2 week sprints (24 environments)…. I was 5 weeks in when it all fell apart and I had to cancel the remainder.  One prod and one test had major issues that were resolved by having the right strategy like you say.  Another prod environment was so badly impacted by the web adaptor issues that I spent more hours (unpaid) supporting it than I did upgrading it.

11.1 made me tighten up further on my pre-upgrade testing and overall approach.

At the end of the working day. I’m taking prod environments out of service at 5PM on a Friday and handing back 24 hours later, fully upgraded, smoke and integration tested and allowing the client a day to UAT.  Like you. It’s all about having a rolled back or upgraded environment available at 8am on Monday morning.  It’s rare the upgrade isn’t in place but I’ve had to make that call before.  

Scott Tansley
https://www.linkedin.com/in/scotttansley/
0 Kudos
Brian_Wilson
Occasional Contributor III

I now have a throw away Windows Server virtual machine running for testing. It's a full clone of the real machine. I did not bother to run "upgrade" here since I am trying to find a "Plan B" approach as a backup if Upgrade fails again. I was able to uninstall 10.9.1 and delete the C:\arcgisserver folder. Then I installed 11.2 Server from scratch. I used the authentication key for 11.2 to set it up (Esri told me there was no problem using more than one copy of server, this is news to me.)

Then I did "Import Site" with the .agssite file that I exported from Server 10.9.1. The import succeeded and I restarted Server but when I was done I saw that it did not set permissions correctly I fixed that and restarted again. I could see the files were there in C:\arcgisserver\ but they did not show up as services. Then I tried exporting SERVICES instead of SITE and did Import Services and that worked.

Some of the services did not import because they are ArcMap runtime based, so I have to go look at those and either upgrade them or delete them from the existing server. I told Server to upgrade everything months ago but apparently that's another bug? Apparently it lied to me. 🙂 Better to know now than when I do the production machine.

I am hoping that the next time I try the upgrade procedure, it just works, but if not I have at least a partial path forward now. If I have to do the "Plan B fresh install" I will still have to stumble around clumsily hooking up the DataStore and "Federating." But I can probably figure that out again. It's only been a few years. 🙂

 

0 Kudos
Scott_Tansley
MVP Regular Contributor

It's not easy.  I saw this sort of thing all the time when I was working for Esri distributors.  I realised a pathway forward for standardisation and that's what I roll to my clients.  There are some simple patterns practices that make upgrading simple/reliable.  I should probably right a book, but then no one would pay to use my services.  Double edged sword.  

We have to remember that the upgrade is a double hop.  The 11.2 installer upgrades the core software, when that's complete it upgrades the content.  I'm amazed you got the 10.9.1 (less MXD services) to even run - I've never tried that.  

Scott Tansley
https://www.linkedin.com/in/scotttansley/
0 Kudos
Brian_Wilson
Occasional Contributor III

Maybe I have to try to trip the upgrade process. I think I saw a reference to that.

From logs it looks like the only ArcMap based services were Esri's own -- the geoprocessing and printing services.

The datastores are not connected, I suppose that's the other half of what comprises a "site". I am going to try Import Site again.

 

0 Kudos
BenClark
Occasional Contributor

I ran into a very similar problem while upgrading from 10.9.1 to 11.2 this past weekend. I eventually got the Server configuration page to load by replacing "localhost" with hostname.subdomain.domain. Have you tried that combination?

0 Kudos
Brian_Wilson
Occasional Contributor III

Thanks for the suggestions

I tried every combination I could think of, using localhost and hostname. I used the full hostname cc-gisserver.clatsop.co.clatsop.or.us and from both the server and my laptop. (on the same network) I tried using port 6443 with HTTPS and port 6080 with HTTP

I tried running under the local user and under the AD user. I had to install with the local user account because the AD domain is "too long" according to the installer.

0 Kudos
BenClark
Occasional Contributor

Ugh, that's awful. I really feel for you--I worked well into the wee hours on Sunday sorting out my issues. Good luck.

0 Kudos