10.8 HA split install

931
5
04-27-2020 12:25 AM
GillPaterson
New Contributor III

Hi,

We are planning on installing 10.8 on existing servers, but instead of upgrading we are uninstalling 10.6.1, cleaning and then installing 10.8 new. It has been requested that we keep the current env live as much as possible but splitting the install into two server groups. The server groups have the following number of machines:

Server Group 1 (SG1) :

- 1 Portal/WebAdaptor

- 2 ArcGIS Server / Relational DataStore

- 1 GeoEvent

- 1 SpatioTemporal

Server Group 2:

- 1 Portal/WebAdaptor

- 2 ArcGIS Server / Relational DataStore

- 2 GeoEvent

- 2 SpatioTemporal

The rough plan so far is to:

- remove SG1 from the LB (users can still access services and apps through SG2)

- on SG1 run the uninstalls, clean out the remaining files, install 10.8

- configure the individual sites and SSL certs,

- federate AGS and GeoEvent to Portal

- Integrate windows auth

- publish the services and add items to Portal

- switch the LB so that traffic now goes to the newly upgraded SG1

- test if SG1 is going to working, if yes

- start uninstalls, clean install on SG2.

I have concerns about federation if this is going to be possible without the LB URL and if anybody else has tried this method with success?

Thanks

0 Kudos
5 Replies
JonathanQuinn
Esri Notable Contributor

We recently put out a blog about upgrading while minimizing downtime:

https://www.esri.com/arcgis-blog/products/arcgis-enterprise/administration/minimize-downtime-when-up... 

It goes through a similar approach as what you're outlining. I'm not sure I understand the two server groups, though. do you have two separate Enterprise sites behind a load balancer, or is it currently HA, as the title of your post implies? You'll split out the machines and then configure them separately?

0 Kudos
GillPaterson
New Contributor III

Thanks for the link Jonathon I will read through that now, and yes, it is currently HA. We are hoping to split it and update them separately and join them back together at the end.

0 Kudos
GillPaterson
New Contributor III

So reading through the blog and its links I think the following is a possible plan. (First run is in the UAT env, although these servers are in the production domain for internet access. There are no changes or additions to the LB apart from disabling/enabling server rules)

- Split the current HA env.

- Remove Group 1 from the LB

- Set Group 2 as read only and leave live in LB to limit downtime

- Uninstall, Clean, Install 10.8 and configure Portal site, AGS site, Datastores reg to AGS site, GeoEvent sites and SBDS reg to AGS site.

- Install WA and register Portal and AGS through machine URLs

- ? Set Portal webContext to LB URL (even though at this point this machine is disabled in the LB?)

- Federate AGS and GeoEvent (Service URL = LB URL, Admin URL = local machine)

- test services and apps through machine URLs (but if Federated through LB URL will tokens be from the new updated Portal? or from the Portal still live in the LB (which means that they will be invalid)?)

- switch groups in LB - Group 1 live, Group 2 removed

So I think my main points of concern are when to set the Portal webContext URL, the Federation URLs and if the Service URL is set to the LB URL if the services and apps can still be tested prior to turning traffic back on through the LB. (Hopefully I have made sense!)

0 Kudos
GillPaterson
New Contributor III

Also, our non prod env is disconnected from the internet. On Portal service startup around 750 errors are recorded where it can't find styleItem files eg D:\Program Files\ArcGIS\Portal\framework\webapps\rootapp\styleItems\RealisticTransportation\web\Tesla_P7.json

Are these not included in the cabinet install files? is there somewhere else we can get them?

0 Kudos
GillPaterson
New Contributor III

Our final solution looked like this which slightly differs to the steps in the blog by Jon but it worked well for us.

As above we split the install between 2 groups of servers

1. Disable Group 1 servers in the LB so that no new connections from users could be made

2. Run TVT on the still live services to ensure that the Group 2 servers still live in the LB were working properly

3. Uninstall Group 1 and clean up any left over files. Restart servers

4. Install new software

5. Set up new data share config folder for Portal and ArcGIS Server using the service account user

6. Set up Portal and ArcGIS server sites (including ArcGIS Server sites for GeoEvent)

7. Configure Relational and Spatiotemporal data stores (update SSL certs)

8. Register other datastores in ArcGIS Server Manager

9. Install and configure Web Adaptors for Portal and ArcGIS Server

10. Import service config and global settings into GeoEvent and publish output feature services (using connection to ArcGIS server using machine name url and credentials to create the datastore connection in GeoEvent Manager)

11. Publish mxd based services to ArcGIS Server using connection created with machine name url and initial admin user

12. Configure any other ArcGIS Server of Portal settings that don't require a LB URL, eg heap sizes for services that have a higher load, date settings and default user roles in Portal.

13. Run TVT on these new services using machine URLs

14. Switch Groups in LB - Disable Group 2, Enable Group 1 = Start of Outage

15. Update Web Context URL in Group 1 Portal and ArcGIS Server

16. Federate Group 1 ArcGIS Server with Portal and set hosting server. Check that all services have been added as Portal items.

17. Federate GeoEvent and any other servers

18. Set any other Post install config for Portal - eg name, description (we have a script that does this)

19. Enable Windows Auth

20. Update Portal Items (we have a script to do this), Groups and sharing of items.

21. Update GeoEvent datastores and output connections to use lb URL (no credentials or tokens required as federated).

22. Run TVT using LB and windows auth = End of outage (for us this was around 3 hours).

23. Start Uninstall and re-install of Group 2 servers

24. Join Portal into site from Group 1, this will cause a ~2 minute outage.

25. Join ArcGIS Server into site from Group 1

26. Install and configure Web Adaptors for Portal and ArcGIS server using Group 1 machine names

27. Configure Datastores to ArcGIS server machine from Group 1

28. Enable Group 2 servers in LB

0 Kudos