Hi all,
I am in the process of establishing the steps towards upgrading our HA deployment of Portals from 10.9 to 11.1. The relevant questions and issues are in a different post: https://community.esri.com/t5/arcgis-enterprise-portal-questions/upgrade-high-available-portal-from-...
We have a HA deployment with 2 Portal, 1 Server, 2 Datastore and 2 TileCache and a shared content folder in a storage account. Everything works fine in 10.9
When attempting to upgrade to 11.1 the steps I followed this: Configure a highly available portal—Portal for ArcGIS | Documentation for ArcGIS Enterprise
1. from portaladmin, checked the status of primary and standby machines, they seems fine and primary VM is primary and standby is secondary VM. Services are running fine.
2. Upgraded both portal VMs at same time. Waited to see the completion of the installation process in both machines.
3. Proceeded with Upgrade option in Portal Primary, and saw steps 0/8 appearing.
Monitored the log files and contents changes in both machines and found something interesting: The backup content, database and index directories actually happened in the secondary portal machine (Standby) not in the primary portal machine. This created a new folder called upgrade-backup /10.9.0. It also upgraded the database in standby.
The upgrade then obviously failed with error: Failed to import data to the content directory correctly. Unable to unzip portal content. failed to Upgrade the primary portal (then gave the standby portal name).
My question is: How can I make sure that the upgrade is happening in the Primary Portal not in the standby?
After installing the software upgrade in the machines, is there a need to stop the Portal for ArcGIS service in the standby machine?
How / when the standby machine bring back to the upgrade process?
In one of my previous attempt, if the standby machine is not running the service then the step 8 was missing (Upgrade standby machine).
This is a very confusing process, any help from @ESRISupport @JonathanQuinn would be really appreciated
Solved! Go to Solution.
Depending on when you ran the setups and when the services on each machine were stopped as part of the upgrade process, there's a chance that the standby determined that the primary was unhealthy and promoted itself to primary. It's more deterministic to run the setups sequentially rather than in parallel, or at least run it on standby first, wait for the standby to shut down, then run it on primary. I'll take a look at whether we can improve our documentation. Aside from that, the failure to unzip content may not be relevant to the roles of the machines. I'd suggest reaching out to Support with debug logs of the failure so they can troubleshoot. If you want, reply with the case number so I can take a look.
Depending on when you ran the setups and when the services on each machine were stopped as part of the upgrade process, there's a chance that the standby determined that the primary was unhealthy and promoted itself to primary. It's more deterministic to run the setups sequentially rather than in parallel, or at least run it on standby first, wait for the standby to shut down, then run it on primary. I'll take a look at whether we can improve our documentation. Aside from that, the failure to unzip content may not be relevant to the roles of the machines. I'd suggest reaching out to Support with debug logs of the failure so they can troubleshoot. If you want, reply with the case number so I can take a look.
Thanks very much @JonathanQuinn that make sense, as I started the installation first on Primary VM and then followed by Standby. I will reverse the steps and run again and will update here.
Another attempt today, as suggested by @JonathanQuinn: started the upgrade in Standby VM and waited till Portal service stopped in standby portal and primary portal status as primary. Then started the upgrade in primary and the upgrade-backup folder being created in the same portal.
But noticed that the step 8: Upgrade standby machine is missing as seen in the previous attempt. (see images below: 1st one is from previous attempt where upgrading primary portal vm started upgrading standby portal vm).
The below image shows the current upgrade, where primary portal being upgraded, but no status mentioned about standby portal
Is this the right procedure and expected?
The standby will be upgraded during the primary upgrade, if the upgrade was invoked on standby, or during the post-upgrade step if the upgrade was started on the primary. Either way, its handled at some point; just depends on which node you started the upgrade from.
Many thanks @JonathanQuinn for the suggestions, my latest upgrade attempt was successful, after following the steps suggested by you. The standby portal and its database upgraded as well. I think your suggestion to update the documentation on the portal upgrade steps in HA deployment would be very helpful for others.
For the benefits of others who might be have questions, this is what I have done:
1. Password for the windows service account (arcgis) is valid, if not update it in all VMs running portal, server, datastore services
2. Check the portal content and look for any Thumbs.db files which has to be removed prior to upgrade (https://support.esri.com/en-us/knowledge-base/error-failed-to-import-data-to-the-content-directory-c...)
3. Check the statuses (Portal Health Check) of primary and standby portal machines via portaladmin (to make sure the upgrade process initiated in the standby portal VM)
4. Run the upgrade process in the standby portal VM and wait till the service account stops the Portal for ArcGIS service in that VM. When installation completes, leave the machine as it is. Do not click on the upgrade option
5. Continue the upgrade process in the Primary portal VM
6. When installation completed in Primary portal VM, continue with portal upgrade including new license file (normally 7 steps). Upgrade web adaptors if that is being used (we use application gateway). At this stage should be able to login to Portal (via the Primary portal VM) and see that the 11.1 interface appears.
7. Do not run 'Post Upgrade' until all the other VMs participating in the enterprise deployment are also upgraded to same version as Portal.
8. Upgrade GIS server. In the case of Federated Server, In order to complete the Server upgrade, it will need access to Portal machine so the access via web adaptor / application gateway must be working. Upgrade web adaptors as needed. Should be able to access the server admin directory and see the server version as 11.1.0
9. Upgrade datastore in both primary and standby VMs (can be upgraded simultaneously). Make sure the installation is complete before attempting to run the ArcGIS Data Store Upgrade in the Primary datastore VM. The GIS Server URL has to be the private one (ending with 6443) eg: servername.domain.com:6443/arcgis. There will be a Prerequisite Check, such as data store type and status. This will then configure datastore in primary followed by standby datastore VMs. Upgrade status will be displayed.
10. Repeat step 9 for any additional sets of datastore VMs (such as Tile Cache).
11. Once all the GIS Server and Data Store Servers are upgraded successfully, go back to Portal Primary VM. There will be a message to run the 'Post Upgrade' operations. Continue with that and it will then upgrade the Standby Portal and its database. Log files from both Portal VMs can be checked to see that the secondary portal database and contents are upgraded.
12. Once completed, login to portaladmin and check the version. It should be 11.1.0