We have designed ArcGIS Enterprise 10.7.1 high availability solution with Web Adaptors and third-party load balancer. All servers are VMs. We need to document ArcGIS Enterprise 10.7.1 fail-over scenarios and test all these scenarios prior to go-live to confirm the fail-over happens correctly. I have struggled to find any documentation on all fail-over scenarios but here is a list of fail-over test cases:
ArcGIS Portal Fail-over:
Stop ArcGIS Portal Service on a Primary node.
Shutdown server hosting Primary ArcGIS Portal.
Take the Primary ArcGIS Portal machine off the network.
ArcGIS Server (2 machine site) Fail-over:
Stop ArcGIS Server Service on one of the servers
Shutdown one of the Server hosting ArcGIS Server
Take one of the machine off the network.
Disjoin a machine from a site.
ArcGIS Data Store Fail-over:
Put the machine with Primary Data Store off the network
Make secondary Data Store primary.
Are above scenarios correct for fail-over testing? Am I missing any scenario? There are some scenarios we cannot test e.g. disk crash.
Portal's failover revolves around standby not being able to reach primary, so any of the ways you described will do that.
There is no concept of "failover" for Server, since there are no roles.
In either case above, the machine or service being unavailable will indicate to your load balancer (as long as you have HTTP health checking configured) or the web adaptor that the component is unhealthy, and it won't send traffic to it.
For Data Store, yes, those scenarios will cause a failover.