Select to view content in your preferred language

Config Store Issues

5095
17
06-23-2014 05:21 PM
JustinGreco
Frequent Contributor
We recently moved to a multiple server site due to bug fixes at 10.2.2 concerning slowness with manager when using a network share for the config store and slow performance from caches on network shares.  However, we have had issues with ArcGIS Server losing connection to our config store which we placed on a network NAS devices.  Moving the store local to one of the servers in the site resolved the issue, but introduced a single point of failure, which unfortunately did happen to us when that server crashed.  One note about the space is that it is used organization wide for storing all data.  We are looking into moving it to a less used storage device.  Thought I would see if anyone had any suggestions before we do so.  I am afraid the same thing will happen and we will need to move it back to the server again.
Tags (2)
17 Replies
RoyceSimpson
Frequent Contributor
Welcome to my world. 

Evidence is mounting to suggest this is a 10.2.2 issue.  I'm very close to reverting to 10.2.  This is a real bummer since we've just started to do some Collector/offline editing stuff, which requires 10.2.2.

Esri, any word on this being a bug?  Fixed at 10.3?
0 Kudos
DavidColey
Honored Contributor
Hello-
Yes we experienced the same issues as well with our distributed site when we moved to 10.2.2 as well.  We also moved to .2 for the same reason - promised 10.2.1 slowness fixed that never seemed to materialize.  As such we moved all server directories (config, data, system etc) off of our SAN share and onto a local array of raided disks and then re-establihsed the shares. This did seem to allivate some of our issues for a few days, but alas not for long. 

So what I have taken to doing for workarounds is to script local fdgb copy proceedures from a staging environment to the data-store for heavy I/O traffic layers such as parcels and tax roll info and spatial views to relieve some cpu pressure off our SDE.  That has helped considerably but ideally I shouldn't have to resort to these types of programmatic gymnastics. 

More:  because we use a three-server site with two clusters, I typically set my map services with zero minimum pooling instances.  This is a known issue now with esri as far as service overwrites are concerned, whether the data is sourced from SDE or from a data-store fgdb.  So in when I do overwrite, I first dial up the minimum instances to one, then after the overwrite succeeds, I return to zero.Even so, I have still experieced dirty overwrites (leaving behind service .glock files that can only be removed by stopping the ArcServer.exe proccesses on each machine) from time to time if I overwrite large services during peak usage hours - so I don't do it.

Lastly, see my thread Manage Map Cache Tile: Errors at 10.2.2.  We have yet to be able to overwrite an existing cache- regardless of source - without receiving 'error moving bundle' and completely corrupting the cache.  I am in communicaiton with esri on this one but havn't heard back from them in many weeks.

I hope all  of this helps the community-
Thanks
David
RoyceSimpson
Frequent Contributor

We have reverted back to 10.2 and are experiencing normal, stable behavior.

We discovered that at AGS 10.2.2, our Netapp shared storage appliance was keeping files open and never freeing them up (probably most from the tile cache folders and config-store), so over the course of about 4-5 days, the Netapp would have around 180,000 open files on each of both our AGS servers.  At some point the Netapp reaches a limit on what it can handle per server for open files and it starts freaking out.  That in turn causes the AGS servers to go wonky and eventually crash.

I'm currently in communication with esri support on this so hopefully there will be some solution.

0 Kudos
DavidColey
Honored Contributor

Very interesting Royce, thanks for the post.  I too shall run this by my IT network and storage people to determine if we have something similar happening as well.

0 Kudos
RoyceSimpson
Frequent Contributor

This is a possible deal breaker for us so the more traction this gets, the better.  Thanks for looking into it on your end.

0 Kudos
DavidColey
Honored Contributor

For sure, I'll post as soon as I find out

0 Kudos
JustinGreco
Frequent Contributor

We have been working with Esri on this issue for about 3 months now and I believe we have it at the highest level of support possible.  I sent this forum post and one older one you had to our contact with Esri and a day later they said that this is indeed a 10.2.2 bug and they are planning on creating a patch for it and have said that it will definitely be fixed at 10.3.  They also don't believe that this bug would occur if installed on Linux servers, which is something we are seriously looking into (our IT people were ecstatic to hear that) .  But our plan now is to stay at 10.2.2 on Windows with a single server site and the configuration store local, since this has been stable for about a month now.

Thanks for all the information you supplied, really helped them identify the issue. 

RoyceSimpson
Frequent Contributor

Ok, good to hear.  We've been back on 10.2 for a month or so with zero issues.  Do you have any info on the actual bug so we can track it?

0 Kudos
RoyceSimpson
Frequent Contributor

Is this the bug?  It's the only one that looked relevant. Was created as a bug back in January with the 10.2.1 release.

http://support.esri.com/en/bugs/nimbus/role/beta10_1/TklNMDk3OTYx

0 Kudos