Portal 10.6.1 HA Performance Issues

02-07-2020 12:09 PM
Occasional Contributor III

I have configured two Portal machines as HA in my enterprise 10.6.1 deployment, and am getting some strange performance issues that I can't quite figure out.

In short - requests through the web adapter (ie myportal.domain.org/gis/home) are taking an inordinately long time to resolve - sometimes 10 seconds or more (it was not like this before HA configuration).  Only if I directly make a request to the standby machine will anything resolve within an acceptable amount of time. 

For the purposes of this, my base config is:

Machine A: Primary

Machine B: Secondary

If I bypass the web adapter and make a request to each individual machine (mymachine.domain.local:7443/arcgis/home) , making a request to the Machine A (primary, so ostensibly the same machine the web adapter is making the request to) takes an equally long time.  

However, if I make a request to the Machine B (standby machine), it resolves quickly.  

Now, if I stop the Portal for ArcGIS service on the Machine A, the Machine B gets promoted to primary.  Making a request to machine B (previously standby machine but now primary) takes again, a very long time.  This means that it doesn't seem to be machine specific.  

If I restart the Portal for ArcGIS service on Machine A, Machine B remains as the primary, and Machine A is still standby, but now operational.  In this case, making a request to Machine A standby resolves quickly, where as Machine B primary does not resolve quickly.  

In order to return to my base config from the beginning - I have to stop the Portal service on Machine B, which promotes Machine A to primary, and then restart Portal on Machine B (which keeps Machine A primary and Machine B standby). 

Yes, my head hurts to.  Any idea what could be causing this???

0 Kudos
2 Replies
Occasional Contributor III

It would be great to get an haconfig.properties, nodes.properties, and portal-config.properties examples for a correctly configured ha portal env at 10.6.1.  I'm checking out all these files, noticing some strange things, and would like a baseline for what should be correct....

0 Kudos
New Contributor II

If its a hardend environment check for certificate revocation

Particularly if your oganization makes it OWN SSL certs

I had a 30 second delay talking to Portal or AGS server sometimes

   Always on first call from browser

   All other calls went through quick after the first call times out.

It was tied to REVOCATION requests for certificates.  If the FIRE wall blocks the outside world you will get this behaviour from a user network talking to a harden backend.

ESRI has an article on revocation adjustments for MS Server and where to find it.

0 Kudos