After Upgrading to 11.2 from 11.1, Hosting Server GPService unreachable

1775
13
12-18-2023 09:50 AM
RzgAdm
by
New Contributor

Hi all,

Posting here as multiple ESRI agents have been stumped by this issue we're having in our environment. In mid-Nov we proceeded with an upgrade of our HA Enterprise environment from 11.1 to 11.2, and have since seen major functionality problems, specifically with our Hosting server. It seems to be having connectivity errors with our Portal machines. Due to this, Workflow Manager has become completely broken, and hosted services in Portal come up with repeated 405 errors when trying to load feature layers. Errors seen in the Portal logs include:

 - Failed to start the Offline Packaging service in the hosting server with URL 'https://*exampleservername.domain.com*:6443/arcgis/admin/services/Utilities/OfflinePackaging.GPServer'. {1} The server at 'https://*exampleservername.domain.com*:6443/arcgis/admin/services/Utilities/OfflinePackaging.GPServer/start' returned an error. Service failed to start 

^When checking the service status on the server, the service is already in the 'STARTED' state, and we have stopped and restarted it multiple times. Even loading the page through an admin token we are still able to perform these actions. 

- Using defaults for some properties of the self resource (ex: supportsSceneServices). The hosting server may not be reachable.

All necessary ports are open on the ArcGIS Server machine, which was perfectly functional before the upgrade to 11.2. In addition, the defective Portal patch in 11.1 was not installed on the Portal machines prior to the upgrade, so they shouldn't be an issue (hopefully) since we have multiple older federated server sites that have been running without any problems. It seems to be isolated to the hosting server site itself. Interestingly we stood up a brand new 11.2 ArcGIS server to add to the existing hosting server site in the hopes of removing the old server that wasn't working, only to be met with a "license does not match error". The new server site was licensed with the EXACT same 11.2 license files that were used to license the original server, so it seems like something may not be authorized properly on the backend of the old server. This is the only lead we've had so far as to a potential cause for this, since we've exhausted pretty much every other suggested troubleshooting step from ESRI. Thanks in advance

 

 

 

0 Kudos
13 Replies
CodyPatterson
Regular Contributor

Hey RizingAdmin,

I may not be as helpful as others, but I have done the same, upgrade from 11.1 to 11.2 and my Workflow Manager has also went down, but from what I've been told, something must have changed on my network and nothing was wrong on the configuration. I'm very interested in what you're experiencing with the Workflow Manager, maybe I could try help on that as well!

With the GPService, I've had luck before, stopping the service, restarting the server, and then once up and going, restart the service in task manager if on windows, restarting the server once more, and then finally starting the service back up. I've had it not work as well, so I then attempted to reinstall the web adaptor and that resulted in it working which is really odd, but nothing really makes sense in Enterprise!

If that doesn't work, let me know and I can try to help!

0 Kudos
RzgAdm
by
New Contributor

We may try to redo the web adaptor, but it would be interesting if that was the problem since the server site still validates OK. But I don't rule anything out with ESRI. For WFM we have stood up a brand new 11.2 server in a different site and set that with the WFM role, but still no dice. Hosting server seems to be completely broken in how it interfaces with ArcGIS datastore or something similar. 

As far as support from ESRI, they gave us the same recommendation for WFM, saying that something on our network must be incorrect. However, this doesn't explain why it was working previously and stopped working at 11.2. Even the techs that we got on meetings with didn't even have their test boxes on 11.2 yet, so none of them could really help with recreating the problem. This seems to be the epitome of the "early adopter" tax that happens with new software. 

0 Kudos
CodyPatterson
Regular Contributor

Hey RizingAdmin,

My server was validating without issue as well, but still was operating like it was broken in half. It may seem odd but as you said, can't rule anything out!

We attempted the same, an entirely new 11.2 server with WFM on it, and it still did not operate. My techs had a 11.2 environment open and ready to go, but somehow their tests worked, when mine didn't. I also inquired on why 11.1 WFM worked without issue, while 11.2 fails completely, that was then again blamed on a change that my network team may have made, I assume that's what the default push is.

What they're now stating is that they believe this is a certificate issue, when once again, it was working without issue in 11.1 with self-signed certificates, and a broken CA signed certificate. I'm not sure what the method of testing they've given you for your network, but please ensure they do not use Test-NetConnection without knowing that a process needs to be listening for that to return true. Even if the ports are open on your machine, they will fail unless a process is listening and you will be told it's your network. If you'd like, I have a PowerShell script that will listen on any port you'd like, and once Test-NetConnection is ran, it will return true as something is listening, it will not work on closed ports of course, so it's a valid test.

I agree that we're in the "early adopter" area and we're facing the repercussions, the support so far has not made really any movement, if anything we're just been in the same spot for 2+ weeks now. Hopefully everything on your side works out! I'll place any updates I get in here.

0 Kudos
RzgAdm
by
New Contributor

Yeah, ESRI seems to be quick to blame common networking issues for things they don't have a direct answer to, unfortunately. They gave us the same basic diagnostic steps. On my one call with an ESRI rep I showed him that our WFM service was up and listening by using the built in WFM health check link:

https://*example.domain.com*:13443/workflow/healthCheck 

And this returned true every time we tried it. I also saw other things from my extensive google searching about it being a cert issue, like you mentioned, involving the SAN field that doesn't affect the base AGS deployment. Still doesn't explain why this would only break things in WFM now, after moving to 11.2, where it didn't before. 

Edit: Were also seeing a bunch of 'invalid Token' errors from our ArcGIS Server to the portal machines, wondering if this is an extension of the problem

0 Kudos
CodyPatterson
Regular Contributor

Hey RizingAdmin,

I did end up seeing a couple of token issues as well, but recently they've seem to disappear.
 
At this point I'm considering wiping the environment, again, and going back to 11.1, so far there's no explanation why 11.2 would be failing but 11.1 working with no change, but it seems 11.1 was a lot more reliable. I was told it could be just increased security strictness between 11.1 and 11.2, but the update notes mention nothing similar. Not sure what to do at this point!
0 Kudos
BEN7
by
New Contributor

hi cody can u help to  fine solution after upgrading11.1 to 11.2 i  have this trouble when i try to activat hosting server Validation failed. Hosting servers require ArcGIS Server to have an ArcGIS Data Store instance registered as a managed database. Select a server that does.

0 Kudos
CodyPatterson
Regular Contributor

Hey @BEN7 

I was curious if your data store validation in the https://<arcgis-server>/<webadaptor>/manager is showing similar to this?

CodyPatterson_0-1709727327217.png


Along with that, can you verify when going on the datastore machine at https://localhost:2443/arcgis/datastore that the correct portal url shows along with the components?

CodyPatterson_1-1709727465895.png

 

Finally, in the server admin page, can you confirm that the Site Root - / is showing Current Version as 11.2.0? Another note, possibly check that in the control panel of the Data Store machine that it is 11.2 as well:

CodyPatterson_2-1709727565168.png

 

Hope that helps!

Cody

0 Kudos
TravisColeHC
New Contributor II

Following.

We just upgraded to 11.2 and our hosting server data store crashed and won't come up. We we tried to start it, it crashed the ArcGIS Server on that server. When we start the server up it is stopped in the server manager. We haven't tried anything yet to fix this and have submitted and ESRI support ticket.

0 Kudos
RzgAdm
by
New Contributor

We were having a similar issue, which we resolved by changing our ArcGIS datastore service account from the local system to the same domain account running ArcGIS server (though it ran for a long time like that without problems before moving to 11.2). However there are still problems with it, such as the ones we have above. 11.2 seems to have broken things in terms of an upgraded environment, since on brand new servers we have stood up we've had no issues at all. 

0 Kudos