So when you publish any hosted service, they all get created as "Stopped"? What happens if you attempt to start it manually through the Admin API? Even though the icon is greyed out in Manager, you'll be able to start it in the Admin API. Do you see any errors in the Server logs when publishing the service? Can you publish a non-hosted service that copies data into the ArcGIS Data Store? You may want to create a new thread about this issue as well.
Hi Jonathan,
We have an ArcGIS Enterprise 10.6 HA-setup where it is critical that the system is up as close to 24/7 as possible. We also experience that failover takes several minutes, but due to the architecture and use of the system, a complete upgrade of the infrastructure to 10.6.1 is currently not possible. Is there a way to tweak any settings to get closer to the failover time we would get by upgrading to 10.6.1?
Best,
Eirik M. Buraas
Unfortunately, most of the time is spent restarting the web server to account for new configuration settings during failover. This step is removed at 10.6.1, which is where most of the performance benefit lies, (on top of better stability as we don't need to modfiy the configuration settings anymore).
Short answer, there isn't much you'll be able to do at 10.6 to improve the failover time.
Hi jonathan,
If arcgis server suddenly starts as a service but no arcsocs appears in task manager. Also permissions for arcgis accout are set properly on folders. Also server logs are not recording finally the service.log is stopping at a message "invoke afterstart()" . Can you tell me any clue or something to check please. The situation is so painfull
Get Outlook for iOS<https://aka.ms/o0ukef>
I'd check the service logs under <install directory>\framework\etc\service\logs. The service-error-0.txt file should give you an indication of what's wrong.
Hi Jonathan.
How about promoting the secondary to primary while both of the machines are up and running. Any downtime on the newly demoted now secondary machine is then not an issue. And the promoting it back to primary after restart. This is in the case of needed reboot of single nodes. Not crashes and errors. (patches etc)
Best Regards Thomas
If flipping the role of the Portals was possible in the software outside of stopping the Portal service, there will still be downtime as the standby's web server needs to restart.
Dear Jonathan,
On Primary machine db folder, I can see recovery.done only (no recovery.conf)
On Standby machine db folder, I can see both (done and conf).
Opening the recovery.done on the primary machine contains the following:
#----------------------------------------------------
# STANDBY SERVER PARAMETERS
#---------------------------------------------------
standby_mode = 'on' # This represents a standby server
#
primary_conninfo = 'host=webgis2.domain.com port=7654 user=repuser1484908555377 password=s0qi37INGNUwOJ3puQ8yCQEWHlRFr7ZWxH8OYdDYSOg='
#
#
trigger_file = 'c:/arcgisportal/db/promote.dat'
while opening the recovery.done on standby machine contained the following:
#----------------------------------------------------
# STANDBY SERVER PARAMETERS
#---------------------------------------------------
standby_mode = 'on' # This represents a standby server
#
primary_conninfo = 'host=webgis2.domain.com port=7654 user=repuser1484908555377 password=s0qi37INGNUwOJ3puQ8yCQEWHlRFr7ZWxH8OYdDYSOg='
#
#
trigger_file = 'c:/arcgisportal/db/promote.dat'
and opening the recovery.conf on the standby machine has the following:
#----------------------------------------------------
# STANDBY SERVER PARAMETERS
#---------------------------------------------------
standby_mode = 'on' # This represents a standby server
#
primary_conninfo = 'host=webgis1.domain.com port=7654 user=repuser1484908555377 password=s0qi37INGNUwOJ3puQ8yCQEWHlRFr7ZWxH8OYdDYSOg='
#
#
trigger_file = 'c:/arcgisportal/db/promote.dat'
Note that having the .done file on the standby machine is not affecting the failover scenario as long as I do not fall in the "forbidden scenario" i.e. shutdown both and powering on the last known as standby before the primary and which you said is a limitation until this moment. Do you think having the .done file on standby might have an impact?
I hope that Esri takes into consideration the serious limitation as in case of the primary server encountered a crisis, then standby server will not function.
Thank you again Jonathan.
The standby machine shouldn't have a .done file, only a .conf file and .conf.bak file. If it has a .done but it's still the standby, that indicates that while failover was happening, the standby was stopped/restarted as well, which disrupted the failover. In this case, failover will never fully complete because that file indicates that the DB has failed over, which is why you see those errors in the logs. If you remove the .done file so you only have the .conf and .conf.bak, you should see a proper failover when you stop the primary.
Hello Jonathan,
My comments pertain to ArcGIS Enterprise Portal 10.4.1. Might apply to 10.5.x but I'm not sure.
COMMENT: Thank you for the suggested shutdown startup order. We'll find that useful.
QUESTION: Now, if the suggested shutdown/startup order isn't followed, and the secondary portal becomes primary, how do we "gracefully" switch back to our desired original primary once both portal instances are back up and running? I have been stopping the portal service on the secondary server (now really the primary) and waiting the 500 seconds for the portal to switch. Must be a better way than that I don't know about.
Thank you,
Todd