So, I checked with ESRI and they don't really have any official guidance on this, and couldn't even tell me if they have ever adjusted it, so I thought I'd ask the community.
We have a multi-machine arcgis cluster using a shared config folder setup. In front of this, we have an arcgis webadaptor running on IIS in Windows. The webadaptor supposedly handles all incoming calls for a cluster setup and routes them to whichever machine it decides has capacity or is hosting the service in question. If both have that service because no one set a preference, it picks one, essentially doing the load balancing between the two AGS machines. I know we could put a straight hardware load balancer of our own in front of the 6080/6443 ports (and get better performance than the 100ms+ that the webadaptor sometimes adds), but I'm wondering if we could also just increase the maximum worker processes on the webadaptor app pool, allowing it to run 2 or more threads to handle incoming requests even more quickly, handing them out to the ags servers, assuming that there are no network or cpu/memory resource constraints (which there don't appear to be, gigabit, cpu tops around 50%, and mem has plenty of room for another w3wp.exe arcgiswebadaptor thread).
So, has anyone played with this, and has it improved performance, or just moved the bottleneck elsewhere? Are there concerns with session data, or is that not a problem since arcgis webadaptor is always a one-and-done request handler anyway?
The web adaptor is doing very basic load balancing. It knows of the two (or more) machines in your site and is doing simple round-robin load balancing to each Server. It has no knowledge of capacity, load, or weighted load balancing nor can it determine that a request should go to a specific machine based on the requested service. What version are you using? In previous versions, clusters were recommended, so a two machine site could have two separate clusters with different services on each. The web adaptor would still send requests to both machines and internally, the Server site can determine where that service resides and that machine will serve the request. That's the only situation where there is internal load balancing/traffic redirection for specific services, though.
Anyway, I've never tried that and I doubt that's something that has been tested internally. I'd proceed with caution. Are you seeing performance issues, or are just wondering about increasing throughput?
A bit of both. We saw a tremendous decrease in response delay for ags services by adjusting the ArcGIS/admin settings for app heap from 256 to 1GB, as the Java worker process was choked at 256 and just stuck in garbage collection all the time. Once that was adjusted, it started consuming up to 4-500mb (no longer maxing out at the new 1GB GC limit) and direct arcsoc request response times dropped by over half during heavy load, so I'm wondering if adding a similar adjustment on the IIS endpoint would improve responsiveness, to spread the load across multiple processes in IIS as worker threads. I can definitely see a 100ms+ addition to the time of response when going through webadaptor vs straight to port for a service request, even when not under load. My thought was that this is another obvious possible chokepoint when under heavy load, as a single process sitting directly in front of the ArcGIS java app process. I don't think it causes any issue for state since most requests come in and go right back out, but I'm wondering if anyone else has made such an adjustment. The complete lack of guidance on it made me worried to make the change. We did have a machine that was running it with 3 worker processes for quite some time (because someone thought that'd be great to set all of the app pools to 3) so I'm fairly sure it's stable.
I guess really I'm just wondering if anyone out there ever really performance tested such a change, and what results they saw.