|
POST
|
Hi David, thanks for the response! We are running 1 server/cluster to provide scalability in terms of the amount of services that can be hosted rather than increase the amount of requests that need to be fulfilled. When 2 servers are added to 1 cluster it scales the amount of requests a service can handle rather than how many services can be hosted. The 'minInstancesPerNode' and 'maxInstancesPerNode' setting specify how man instances per node the publishers would like available for their services. Assume we set 1 'min instances' and 2 'max instances' on each service (default). Assume we had 4 servers each are spec'd to host 250 total instance executables... Scenario 1: Scale the amount of services to be hosted If we have 1 server/cluster and 4 total clusters, then we can host 1000 total services when idle (4 servers*250 instances/server). assuming each service only needs 1 instancePerNode. Obviously this has no room for growth in terms of hosting more services or having more instances available for requests. Scenario 2: Scale the amount of requests that need to be fulfilled If we had a 3 node cluster, then while idle ArcGIS Server will build 1 instance on each node in that cluster for each service (so that 3 executables can handle requests rather than 1 executable in the scenario above). with that scenario, I can only host (at most) 250 total services and assume they will not need a second 'instancePerNode' spun up. Obviously this has no room for growth in terms of hosting more services. Based on our business requirements (for this project).... this arcgis server platform needs to provide large amounts of services that require little use (at this point in time). Isolating the servers into individual clusters meet that need and also provide some isolation from each-other. Monitoring usage in the future may change this publishing model (where we may have a multi-node cluster and move services to that cluster that become popular). Make sense? As for the AD Roles... we had some pretty serious performance issues at ArcGIS Server 10.1 and have not revisited since. I'm not sure if those have been worked out or not in 10.2.2.
... View more
10-14-2014
04:28 PM
|
1
|
11
|
4674
|
|
POST
|
I should have also have asked about the ArcGIS Server heap size settings. We noticed these available settings under the following URL: https://servername.domain/agspub/admin/machines/machinename.doamin?f=pjson
{
"machineName": "machine.domain",
"platform": "Windows Server 2008 R2-amd64-6.1",
"ports": {
"JMXPort": 4000,
"OpenEJBPort": 4001,
"NamingPort": 4002,
"DerbyPort": 4003,
"tcpClusterPort": 4005,
"HTTP": 6080,
"HTTPS": 6443
},
"ServerStartTime": 1412696078447,
"webServerMaxHeapSize": -1,
"appServerMaxHeapSize": 256,
"socMaxHeapSize": 64,
"webServerSSLEnabled": true,
"webServerCertificateAlias": "SelfSignedCertificate",
"adminURL": "https://machine.domain:6443/arcgis/admin",
"configuredState": "STARTED",
"synchronize": false
}
Specifically interested in the 'appServerMaxHeapSize' and the 'socMaxHeapSize' settings. Can anyone provide more insight to those settings? We briefly knocked those up (doubled them at one point in time), but that did not seem to help the situation. We doubled again (and even a third time) to see if it helped with any stability or performance. The highest we went was: appServerMaxHeapSize: 1024 socMaxHeapSize: 256 It ended up causing a crash on one of the AGS nodes that logged the following windows events:
Log Name: System
Source: Microsoft-Windows-Resource-Exhaustion-Detector
Date: 10/7/2014 8:47:13 AM
Event ID: 2004
Task Category: Resource Exhaustion Diagnosis Events
Level: Warning
Keywords: Events related to exhaustion of system commit limit (virtual memory).
User: SYSTEM
Computer: MACHINE.DOMAIN
Description:
Windows successfully diagnosed a low virtual memory condition. The following programs consumed the most virtual memory: javaw.exe (56032) consumed 954195968 bytes, ArcGISServer.exe (41556) consumed 927776768 bytes, and ArcSOC.exe (20524) consumed 407146496 bytes.
...
...
<SystemInfo>
<SystemCommitLimit>89614397440</SystemCommitLimit>
<SystemCommitCharge>89500942336</SystemCommitCharge>
<ProcessCommitCharge>87385600000</ProcessCommitCharge>
<PagedPoolUsage>602058752</PagedPoolUsage>
<PhysicalMemorySize>34359205888</PhysicalMemorySize>
<PhysicalMemoryUsage>25223925760</PhysicalMemoryUsage>
<NonPagedPoolUsage>283160576</NonPagedPoolUsage>
<Processes>282</Processes>
</SystemInfo>
...
and we subsequently placed those settings back to the default.
... View more
10-14-2014
10:13 AM
|
0
|
17
|
4674
|
|
POST
|
We have an existing ArcGIS Server (AGS) 10.0 solution that is hosting close to 1,000 mapping services. We have been working on an upgrade to this environment to 10.2.1 for a few months now and we are having a hard time getting a stable environment. These services have light use, and our program requirements are to have an environment that can handle large amounts of services with little use. In the AGS 10.0 space we would set all services to 'low' isolation with 8 threads/instance. We also had 90% of our services set to 0 min instances/node to save on memory. Below is a summary of our approaches and where we are today. I'm posting this to the community for information, and I am really interested in some feedback and or recommendations to make this move forward for our organization. Background on deployment: Targeted ArcGIS Server 10.2.1 Config-store/directories are hosted on a clustered file server (active/passive) and presented as a share: \\servername\share Web-tier authentication 1 web-adaptor with anonymous access 1 web-adaptor with authenticated access (Integrated Windows Authentication with Kerberos and/or NTLM providers) 1 web-adaptor 'internally' with authenticated access and administrative access enabled (use this for publishing) User-store: Windows Domain Role-store: Built-In We have a few arcgis server deployments that look just like this and are all running fairly stable and with decent performance. Approach 1: Try to mirror (as close as possible) our 10.0 deployment methodology 1:1 Build 4 AGS 10.2.1 nodes (virtual machines). Build 4 individual clusters & add 1 machine to each cluster Deploy 25% of the services to each cluster. The AGS Nodes were initially spec'd with 4 CPU cores and 16GB of RAM. Each ArcSOC.exe seems to consume anywhere from 100-125MB of RAM (sometimes up to 150 or as low as 70). Publishing 10% of the services with 1 min instance (and the other 90 to 0 min instances) would leaving around 25 ArcSOC.exe on each server when idle. The 16GB of RAM could host a total of 100-125 total instances leaving some room for services to startup instances when needed and scale slightly when in use. our first problem we ran into was publishing services with 0 instances/node. Esri confirmed 2 'bugs':
#NIM100965 GLOCK files in arcgisserver\config-store\lock folder become frozen when stop/start a service from admin with 0 minimum instances and refreshing the wsdl site
#NIM100306 : In ArcGIS Server 10.2.1, service with 'Minimum Instances' parameter set to 0 gets published with errors on a non-Default cluster
So... that required us to publish all of our services with at least 1 min instance per node. At 1,000 services that means we needed 100-125GB of ram for all the ArcSOC.exe processes running without any future room for growth.... Approach 2: Double the RAM on the AGS Nodes We added an additional 16GB of RAM to each AGS node (they now have 32GB of RAM) which should host 200-250 arcsoc.exe (which is tight to host all 1,000 services). We published about half of the services (around 500) and started seeing some major stability issues. During our publishing workflow... the clustered file server would crash. This file server hosts the config-store/directories for about 4 different *PRODUCTION* arcgis server sites. It also hosts our citrix users work spaces and about 13TB of raster data. During a crash, it would fail-over to the passive file server and after about 5 minutes the secondary file server would crash. This is considered a major outage! On the last crash, some of the config-store was corrupted. While trying to login to the 'admin' or 'manager' end-points, we received an error that had some sort of parsing issue. I cannot find the exact error message. We had disabled the primary site admin account, so went in to re-enable, but the super.json file was EMPTY! We had our backup team restore the entire config-store from the previous day, and copied over the file. I'm not sure what else was corrupted. after restoring that file we were able to login again with our AD accounts. The file-server crash was clearly caused by publishing a large amounts of services to this new arcgis server environment. We caused our clustered file servers to crash 3 separate times all during this publishing workflow. We had no choice but to isolate this config-store/directories to an alternate location. We moved it to a small web-server to see if we could simulate the crashes there and continue moving forward. So far it has not crashed that server since. During bootups, with the AGS node hosting all the services, the service startup time was consistently between 20 and 25 minutes. We were able to find a start-up timeout setting at each service that was set to 300 seconds (5 minutes) by default. we set that to 1800 seconds (30 minutes) to try and get these machines to start-up properly. What was happening is that all the arcsoc.exe processes would build and build until some point they would all start disappearing. In the meantime, we also reviewed the ArcGIS 10.2.2 Issues Addressed List which indicated:
NIM099289 Performance degradation in ArcGIS Server when the location of the configuration store is set to a network shared location (UNC).
We asked our Esri contacts for more information regarding this bug fix and basically got this:
…our product lead did provide the following as to what updates we made to address the following areas of concern listed inNIM099289:
1. The Services Directory
2. Server Manger
3. Publishing/restarting services
4. Desktop
5. Diagnostics
ArcGIS Server was slow generating a list of services in multiple places in the software. Before this change, ArcGIS Server would read from disk all services in a folder every time the a list of services was needed - this happened in the services directory, the manager, ArcCatalog, etc. This is normally not that bad, but if you have many many services in a folder, and you have a high number of requests, and your UNC/network is not the fastest, then this can become very slow. Instead we remember the services in a folder and only update our memory when they have changed.
Approach 3: Upgrade to 10.2.2 and add 3 more servers We added 3 more servers to the 'site' (all 4CPU, 32GB RAM) and upgraded all to 10.2.2. We actually re-built all the machines from scratch again We threw away our existing 'config-store' and directories since we knew at least 1 file was corrupt. We essentially started from square 1 again. All AGS nodes were installed with a fresh install of 10.2.2 (confirmed that refreshing folders from REST page were much faster). Config-store still hosted on web-server We mapped our config-store to a DFS location so that we could move it around later Published all 1,000 ish services successfully with across 7 separate 'clusters' Changed all isolation back to 'high' for the time being. This is the closest we have gotten. At least all services are published. Unfortunately it is not very stable. We continually receive a lot of errors, here is a brief summary:
Level Message Source Code Process Thread
SEVERE
Instance of the service '<FOLDER>/<SERVICE>.MapServer' crashed. Please see if an error report was generated in 'C:\arcgisserver\logs\SERVERNAME.DOMAINNAME\errorreports'. To send an error report to Esri, compose an e-mail to ArcGISErrorReport@esri.com and attach the error report file.
Server
8252
440
1
SEVERE
The primary site administrator '<PSA NAME>' exceeded the maximum number of failed login attempts allowed by ArcGIS Server and has been locked out of the system.
Admin
7123
3720
1
SEVERE
ServiceCatalog failed to process request. AutomationException: 0xc00cee3a -
Server
8259
3136
3373
SEVERE
Error while processing catalog request. AutomationException: null
Server
7802
3568
17
SEVERE
Failed to return security configuration. Another administrative operation is currently accessing the store. Please try again later.
Admin
6618
3812
56
SEVERE
Failed to compute the privilege for the user 'f7h/12VDDd0QS2ZGGBFLFmTCK1pvuUP1ezvgfUMOPgY='. Another administrative operation is currently accessing the store. Please try again later.
Admin
6617
3248
1
SEVERE
Unable to instantiate class for xml schema type: CIMDEGeographicFeatureLayer
<FOLDER>/<SERVICE>.MapServer
50000
49344
29764
SEVERE
Invalid xml registry file: c:\program files\arcgis\server\bin\XmlSupport.dat
<FOLDER>/<SERVICE>.MapServer
50001
49344
29764
SEVERE
Unable to instantiate class for xml schema type: CIMGISProject
<FOLDER>/<SERVICE>.MapServer
50000
49344
29764
SEVERE
Invalid xml registry file: c:\program files\arcgis\server\bin\XmlSupport.dat
<FOLDER>/<SERVICE>.MapServer
50001
49344
29764
SEVERE
Unable to instantiate class for xml schema type: CIMDocumentInfo
<FOLDER>/<SERVICE>.MapServer
50000
49344
29764
SEVERE
Invalid xml registry file: c:\program files\arcgis\server\bin\XmlSupport.dat
<FOLDER>/<SERVICE>.MapServer
50001
49344
29764
SEVERE
Failed to initialize server object '<FOLDER>/<SERVICE>': 0x80043007:
Server
8003
30832
17
Other observations: Each AGS node makes 1 connection (session) to the file-server containing the config-store/directories During idle times, only 35-55 files are actually open from that session. During bootups (and bulk administrative operations), the file's open jump consistently between 1,000 and 2,000 open files per session The 'system' process on the file server spikes especially during bulk administrative processes. The AGS nodes are consistently in communication with the file server (even when the site is idle). CPU/Memory and Network monitor on that looks like this: AGS nodes look similar. It seems there is a lot of 'chatter' when sitting idle. Requests to a service succeed 90% of the time but 10% of the time we receive HTTP 500 errors:
Error: Error exporting map
Code: 500
Options for the future We have an existing site with the ArcGIS SOM instance name of 'arcgis'. These 1,000 services are running in that 10.0 site for the past few years. Users have interacted with this using a URL like: http://www.example.com/arcgis/rest/services/<FOLDER>/<MapService>/MapServer We are trying to host all these same services so that users accessing this URL will be un-impacted. If we cannot, we will switch to 1 server in 1 cluster in 1 site (and instead have 7 sites). We will then be re-publishing all our content to individual sites but will have different URL's: http://www.example.com/arcgis1/rest/services/<FOLDER>/<MapService>/MapServer http://www.example.com/arcgis2/rest/services/<FOLDER>/<MapService>/MapServer ... ... http://www.example.com/arcgisN/rest/services/<FOLDER>/<MapService>/MapServer We would have extensive amount of work to either (or both) communicate all the new URL's to our end users (and update all metadata, products, documentation, and content management systems to point to the new URL's) and/or build URL Re-direct (or URL Re-write) rules for all the legacy services. Neither of two options are ideal, but right now we seem to have exhausted all other options. Hopefully this will help other users while they troubleshoot thier arcserver deployment. Any ideas are greatly appreciated with our strategy to make this better. Thanks!
... View more
10-14-2014
09:54 AM
|
6
|
51
|
36516
|
|
POST
|
Hello Derek, Thank you for the quick and helpful response! We have a *test* environment that simulates the topology 1:1 to production. We will take your steps and see if we can succeed there. There were also some registry settings/keys that we found (i removed some content): Do you know what those are for and if we need to try to reproduce manually for the 12th, 13th, etc web-adaptors? So far our current 'public facing' topology is supporting 8 AGS 'sites': 4 of which are web-tier auth with 2 web-adaptors each (to support the mix of public and private services) The other 4 are token based (to only support token authenticated specific workflows like Collector for ArcGIS and editing on a portal product) and has 1 web-adaptor each. So we ran into our issue adding our last token based web-adaptor (12th). As for the future... we have plans within 10-15 days to add an additional 3 web-adaptors (2 web-tier to 1 site and 1 token auth) which would bring us to 15. There are no any other immediate plans but a potential to onboard an additional 2-3 like configurations (each with the 3 web-adaptors). So total initially planned are 15 but a potential to increase that to 21 or even 24 at some point in the future. This is also important for another one of our projects that is hosting close to 1000 services in an ArcGIS 10.0 site. We are still working through an upgrade path for that one to 10.1+ architecture and we may be splitting that down into multiple sites. If we do, we will have same requirements (2 'public' web-adaptors for web-tier auth and 1 'private' web-adaptor on our internal network for admin/publishing). That solution is broken down into smaller 'projects' (called eco-regions) and are put in a specific folder on that arcgis server deployment. There are currently 7 deployed with a plan for an additional 7 within the next year or two. So if we took an approach to break these into separate sites, then we would be hosting 14 web-adaptors for that project. Thanks again!
... View more
10-02-2014
08:09 AM
|
0
|
0
|
2018
|
|
POST
|
We have a fairly large ArcGIS Server (AGS) footprint. We started funneling all traffic through a single location (IIS web-server) for consistency and maintenance and currently have 11 web-adaptors successfully hosted on a single IIS server. We are having troubles adding a 12th web-adaptor on that same server. Basically, we do the following when we have a new site that needs to be established: Install AGS on a new node and build a 'site' depending on the site the config-store is either local (if single node) or on a file-server (multi-node) Normally running web-tier Authentication Install 2 'consumption' web-adaptors on our 'public' web-server with admin functions disabled Web-adaptor 1 = HTTP and HTTPS enabled. Anonymous access only Web-Adaptor 2 = HTTPS ONLY with integrated windows auth (IWA) using NTLM and/or Kerberos Install a 3rd 'admin' web-adaptor on a 'private' web-server that our admins/publishers use to configure or publish to the site This allows us to host a single point of entry for all of our arcgis server solutions and present with different 'instance names'. Example: www.example.com resolves to our IIS 'consumption' web-server https://privateserver.domain resolves to our IIS 'admin' web-server http(s)://www.example.com/site1/rest/services = site1 anonymous access https://www.example.com/site1auth/rest/services = site1 authenticated access (IWA with kerberos/ntlm) https://privateserver.domain/site1admin/rest/services = site1 authenticated access (IWA with kerberos/ntlm) for administrative purposes. Not accessible to our 'public' users http(s)://www.example.com/site12/rest/services = site2 anonymous access https://www.example.com/site2auth/rest/services = site2 authenticated access (IWA with kerberos/ntlm) https://privateserver.domain/site2admin/rest/services = site2 authenticated access (IWA with kerberos/ntlm) for administrative purposes. Not accessible to our 'public' users etc etc etc We seem to have run into a limit on the number of web-adaptors that can be hosted on the www.example.com web-server. We have the web-adaptor product installed 11 times. When we launch the install executable for the 12th time it asks us to repair or modify our previous installation. So it seems there is a limit to 11 web-adaptors that can be hosted on a single IIS web-server. Has anyone ran into this same or similar issue? Our only way around this seems to be to establish a load balancing mechanisim in front of our IIS web-server to send the first 11 web-adaptors to 1 IIS host and the other web-adaptors to another IIS host based on some fancy routing (or URL Re-write) rules. Currently running ArcGIS Server 10.2.1 and 10.2.2 in this environment. Thanks for any info!
... View more
10-01-2014
12:53 PM
|
0
|
4
|
5186
|
|
POST
|
We have public facing web-services with the 'feature access' capability enabled. Operations include create, update, delete (and geometry updates). We are also attempting various offline capabilities using the newly offered 'sync' operation. We are unable to get any of this to work using web-tier authentication (NTLM, Kerberos,Http Basic, etc). The public facing web-services are configured similar to the Multiple firewalls with reverse proxy and Web Adaptor in a perimeter network on the ArcGIS server help documentation: web-tier authentication User store: windows domain role store: built-in web-adaptor server sitting in our DMZ GIS Site sitting in our internal network Reverse proxy communication from DMZ to internal network. Web-app Firewalls (WAF) in front of and behind the web-adaptor server in the perimeter DMZ environment on the web-adaptor server we have deployed two web adaptors to Supporting a mix of public and private services. One web-adaptor is deployed over both port 80 and 443 but allows strictly anonymous access (for consuming and unauthenticated access). The second web-adaptor we have anonymous access disabled and have enabled Integrated Windows Auth (IWA) using both kerberos and NTLM as providers. We have also tested using HTTP Basic. If we add feature service content to an arcgis online map, it tells us that its an internal only service and editing is disabled. The services are accessible from the dirty internet. It appears that arcgis.com map executes a request to the service info page (https://www.myserver.com/arcgisauth/rest/info?f=json) and also tries to proxy the request like so: Request URL:https://www.arcgis.com/sharing/proxy?https://www.myserver.com/arcgisauth/rest/services/FeatureServices/MyService/FeatureServer/0?f=json
Request Method:POST
Status Code:504 Gateway Timeout Both of those fail because the public facing server returns an HTTP Error code 401 with the 'www-authenticate' headers as the options the client has available to authentication. We have tried Kerberos, NTLM, and HTTP Basic. It appears that the arcgis.com map ignores he 'www-authenticate' header and just disables the editing capabilities rather than attempting to obtain the user credentials. We can successfully configure the public facing web-server to use GIS Token based authentication (anonymous enabled and IWA disabled on the web-adaptor server), but that security configuration is really not ideal for our customer base. Is this a known limitation? Is there something we are doing wrong? I would have expected web-tier authenticated services to have editable capabilities if users supplied their credentials. Thanks for any help/guidance!
... View more
06-20-2014
12:04 PM
|
0
|
7
|
6755
|