Alex,
Thanks for the info. I am kind of relieved that clustering is not being recommended at this time--it's a bit complicated and not living up to the hopes--per that link you provided.
However, I still don't know why would restarting any of the machines in the cluster cause the '400 Bad Request' error. I kind of have an idea as to why--here is some info:
I have successfully created a three ArcGIS Server (10.3.1) cluster; all these run GEP (10.3.1 with a patch installed). The public facing machine is a separate machine running IIS with Application Routing (ARR) doing the role of not only the standard reverse proxy but also a reverse proxy for WebSocket--I don't think one has to have nGinx running in case of Windows. The ArcGIS has the WebSocketContextURL pointing to the Reverse Proxy.
All that works as expected, I think! But, upon any of these ArcGIS Server reboots, I see these these '400 Bad Request' in the GEP log files. The Clustering diagnostic utility for RabbitMQ doesn't find any problem even after reboots. @Here is the error:
com.esri.ges.datastore.agsconnection.AbstractStreamServiceClient During initialization, an unexpected error has occurred. Cannot communication with the Stream Service 'issmon1f'. Error: 'Bad response status 400 Bad Request'. Sep 8, 2016, 10:27:56 PM ERROR
Anyway, I am about giving up on GEP clustering but had spent so much time on that..I am curious to know as to what is happening? I feel like I was almost there: Failover, High Throughput in an ArcGIS Cluster running GEP.
Thanks.
PS. rsunderman-esristaff