We have reverted back to 10.2 and are experiencing normal, stable behavior.
We discovered that at AGS 10.2.2, our Netapp shared storage appliance was keeping files open and never freeing them up (probably most from the tile cache folders and config-store), so over the course of about 4-5 days, the Netapp would have around 180,000 open files on each of both our AGS servers. At some point the Netapp reaches a limit on what it can handle per server for open files and it starts freaking out. That in turn causes the AGS servers to go wonky and eventually crash.
I'm currently in communication with esri support on this so hopefully there will be some solution.
Very interesting Royce, thanks for the post. I too shall run this by my IT network and storage people to determine if we have something similar happening as well.
This is a possible deal breaker for us so the more traction this gets, the better. Thanks for looking into it on your end.
For sure, I'll post as soon as I find out
We have been working with Esri on this issue for about 3 months now and I believe we have it at the highest level of support possible. I sent this forum post and one older one you had to our contact with Esri and a day later they said that this is indeed a 10.2.2 bug and they are planning on creating a patch for it and have said that it will definitely be fixed at 10.3. They also don't believe that this bug would occur if installed on Linux servers, which is something we are seriously looking into (our IT people were ecstatic to hear that) . But our plan now is to stay at 10.2.2 on Windows with a single server site and the configuration store local, since this has been stable for about a month now.
Thanks for all the information you supplied, really helped them identify the issue.
Ok, good to hear. We've been back on 10.2 for a month or so with zero issues. Do you have any info on the actual bug so we can track it?
Is this the bug? It's the only one that looked relevant. Was created as a bug back in January with the 10.2.1 release.
http://support.esri.com/en/bugs/nimbus/role/beta10_1/TklNMDk3OTYx