Maybe you can check what the differences in the dev/test environment is compared to your production environment with your admins. They could have possibly applied an OS patch to dev/test that was not applied to production (or vice/versa).
Do you know if your dev/test servers are absolutely identical to your production servers? I ask because my organization builds our servers at different times so there could be slight differences (Not an ideal situation for a controlled environment) between the environments which could explain why dev/test is working as expected and you have a problem with only your production environment.
Do you have a heavy load on the production servers when this issue occurs?
On Friday, we found the our AD ArcGIS Sever account had been removed from our server admin group. We added it back in and haven't had the "config-store" communication issue yet. Will update this thread in a week or so with a "yay or nay" on this being the fix.
did you get this resolved? Same issue here.
I just got alerted that a patch was released that addresses the "Too many open files" issue. It has been resolved at 10.3.1. It is an issue at 10.2.2 and 10.3.
http://support.esri.com/en/downloads/patches-servicepacks/view/productid/66/metaid/2197