Hello,
For several weeks or more our map services in our ArcGIS Server 10.4.1 setup occasionally fail to draw when browsing to their export REST endpoints.
We have a web adaptor and 2 machines on the default cluster, and a SQL Server SDE, all using HTTPS. Currently we only have 41 services running with 1 to 2 instances, with less than 200 layers between them all. Our server machines are VMs with two cores and 16 GB of RAM, but currently use 11 to 14 GB with our load. CPU usage is fine, ranging from 4% to 60% on average. Our McAfee enterprise setup has already been configured to exclude ESRI's recommended folders.
Problem:
When browsing to a service's REST endpoint on each machine (https://[machine-1]:6443/arcgis/rest/services/[service]/MapServer/export?bbox=[default-bbox]) we get the expected HTML output, but the image is blank. The other machine in the cluster might disp...
It seems like it's mostly the same few services which fail, but that's probably just because they're the ones we use/see most often in our Geocortex Essentials site (though the number does seem to be slowly increasing). These services are pretty much all Map Services with dynamic layers turned on, nothing special. We do have a few cached maps, and so far (fingers crossed) these have not been problematic at all.
Details / Investigation:
Restarting the bad service in Server Manager or ArcCatalog will temporarily restore functionality for a day or so. Restarting the ArcGIS Server Windows service or the server VM itself will solve it for a bit longer. OR, if you wait five minutes, it might return to normal without needing to do anything, but might start acting up so...
We've tried publishing specific services with a registered File Geodatabase on each machine instead of using the SDE, but that fails as well. We even rebuilt the Map Document for the most problematic service from scratch, only importing symbology & label fonts, and it still seems affected.
It seems like when one service starts failing to export (after a server restart for example), the others are not necessarily more likely to start failing as well.
We've already increased our App Heap Size to 512 MB, and our SOC Heap Size to 256 MB, but this hasn't changed anything.
The ArcGIS Server logs don't give any errors, even at DEBUG level. When a service starts failing, the logs show:
methodName | Message |
---|---|
MapServer.ExportMapImage | Begin ExportMapImage |
MapServer.ExportMapImage | Begining of preparation. |
MapServer.ExportMapImage | End of preparation. |
Extent: [extent] | |
Map.Draw | Beginning of group layer draw: Infrastructure (x2) |
Map.Draw | End of group layer draw: Infrastructure (x2) |
/export | REST request successfully processed. Response size is 429 characters. |
MapServer.ExportMapImage | End ExportMapImage |
When it is working, the logs will show the service going through each layer in the group layer Infrastructure, with log records at each step for:
There is no indication as to what changes when a service "recovers" automatically (e.g. service shutting down & restarting, recycling of data connection, etc.).
Has anyone heard of or seen something similar to this? I've been in contact with ESRI support and they haven't come back with anything yet.
Has anyone else experienced this and/or come up with a solution? We're still experiencing failures almost daily.
We've had ESRI Canada involved for several months and have escalated to critical support, yet despite fairly constant communications they haven't been able to come up with a solution or replicate the problem on their end. They don't have any access to view cases from other ESRI partners so they cannot check in on cases from others internationally who've experienced this.
We've also tried changing recycling times to hourly and it hasn't had any really noticeable effect.
We've had this same issue for a long time. At least a couple of years. It's very frustrating. We have almost the exact same setup, too. VMs with SDE and ArcGIS Server. We've tried a lot of what you've tried, but nothing fixes it for more than a day or two, sometimes a little longer. We end up having to either restart the troublesome service or the entire AGS if it's more than a few services misbehaving, which is often the case. Because we have clients that we host services for, it's very important to try to stay on top of. Unfortunately, it's also just about impossible. Out administrator is at his wit's end, as you can imagine. Every time I email him to say the service(s) is acting up again, I can practically hear him sigh from across the state.
do you have dynamicLayers enabled and have some group layers in your source map? I was wondering whether it is this bug BUG-000122677: When using the Export Map operation, a blank file is.. ?
with 'dynamicLayers' I meant 'Allow per request modification of layer order and symbology' option in the Capabilities page.
#Issue resolved
Hi Everyone,
Finally, we were able to convince Esri support that this is a bug. @Tanu Hoque...The bug you have identified is a result of persistence from our team. This is a bug recognised as a 10.7 version, but this is the only way Esri's process work. This bug has been lodged for 10.7 version and a patch will be released soon. However, we have demonstrated that this bug exists in earlier versions as well (i.e. 10.6, 10.5, 10.4 and some other customers have also mentioned the issue for 10.2). I guess customers will have to request a patch for other versions.
Esri confirmed that this bug not only affects the rendering, but it also affects other operation such as identify, etc.
Issue replication:
Following are the steps to replicte the issue;
1. Navigate to the service end point. You might want to use your testing/staging service because this test will cause the SOC to go bad, requiring a restart of the service
2. Click 'Export Map' operation at bottom. You should see export output image.
3. Navigate back to the service's REST endpoint
4. Click 'Dynamic Legend'
5. Copy and paste following JSON block in parameter:
1. [{"source":{"type":"mapLayer","mapLayerId":1},"id":1},{"source":{"type":"mapLayer","mapLayerId":2},"id":2}] //replace 1 and 2 with a sub layer and a group layer id. In other words replace the "mapLayerId" with the sub-layer id first and then the parent layer id.
6. Click 'Get Dynamic Legends'
7. Once legend is generated, navigate back to the service's REST endpoint
8. Click 'Export Map' operation at bottom //this should generate blank image
Workaround:
Until a patch is released, following is the work around;
- Use ArcGIS Pro 2.4 and connect to AGS (this version has the ability to connect directly to the ArcGIS Server).
- Import your MXD as a Project and perform sanity checks (such as the symbology has not changed, definition queries, scaling, etc.).
- Publish the service to AGS using ArcGIS Pro 2.4.
- Once published, try to follow the steps listed under "Issue replication". You will not encounter the issue.
Hope this will relieve the grief for most of the friends facing this issue.
Regards.
Hi again,
After months of working with ESRI Canada & ESRI Inc on this issue, we had the same diagnosis confirmed for us as well. Though we haven't had an opportunity to test earlier versions, ESRI has done some thorough testing and we have confirmation it affects all versions 10.4.x through 10.7.1, but only if the service was published through ArcMap or ArcPy. ArcGIS Pro-published services seems to be unaffected, but that is not part of my organization's workflow at this time.
Interestingly, my team had independently come to the same root cause as well, with dynamicLegend requests being made out-of-order causing the SOC to become corrupt.
Mukesh Vyas wrote:
5. Copy and paste following JSON block in parameter:
1. [{"source":{"type":"mapLayer","mapLayerId":1},"id":1},{"source":{"type":"mapLayer","mapLayerId":2},"id":2}] //replace 1 and 2 with a sub layer and a group layer id. In other words replace the "mapLayerId" with the sub-layer id first and then the parent layer id.
We'd found however, that making the calls in the proper order (e.g. 1 then 2) did not cause the service to fail in 10.4.1. It was only when the group & leaf layers were referenced out-of-order that the SOC became corrupt.
E.g., If mapLayerId=0 is the root group layer, and it has leaf/child layers with mapLayerId(s)=[1, 2, 3, and 4], we found that referencing [0,1,2,3,4] OR [0,2,4,3,1] OR even [4,3,2,1] in the JSON block was not an issue. But, if any leaf layers were referenced before the groups that contain them—regardless of nesting—(e.g. [1,0,2,3,4]), and the MXD was published through ArcMap or ArcPy, the service would fail 100% of the time.
We've had confirmation that BUG-000122677 has been fixed in version 10.8 as well, but unfortunately that version hasn't been released at this time.
It's good to hear @mukeshrvyas that there may be an actual patch in the works. Our organization has been issued a hotfix specific to version 10.4.1 (S-1041-HF-000004950), but personally we haven't been able to get ESRI Support to confirm there are any internal plans to port this hotfix or release an official patch for this DoS vulnerability for versions other than 10.4.1 & 10.8.
You can try contacting ESRI Support & requesting this specific hotfix (S-1041-HF-000004950) if you're using 10.4.1.
Hi Everyone,
Finally, we were able to convince Esri support that this is a bug. @Tanu Hoque...The bug you have identified is a result of persistence from our team. This is a bug recognised as a 10.7 version, but this is the only way Esri's process work. This bug has been lodged for 10.7 version and a patch will be released soon. However, we have demonstrated that this bug exists in earlier versions as well (i.e. 10.6, 10.5, 10.4 and some other customers have also mentioned the issue for 10.2). I guess customers will have to request a patch for other versions.
Esri confirmed that this bug not only affects the rendering, but it also affects other operation such as identify, etc.
Issue replication:
Following are the steps to replicte the issue;
1. Navigate to the service end point. You might want to use your testing/staging service because this test will cause the SOC to go bad, requiring a restart of the service
2. Click 'Export Map' operation at bottom. You should see export output image.
3. Navigate back to the service's REST endpoint
4. Click 'Dynamic Legend'
5. Copy and paste following JSON block in parameter:
1. [{"source":{"type":"mapLayer","mapLayerId":1},"id":1},{"source":{"type":"mapLayer","mapLayerId":2},"id":2}] //replace 1 and 2 with a sub layer and a group layer id. In other words replace the "mapLayerId" with the sub-layer id first and then the parent layer id.
6. Click 'Get Dynamic Legends'
7. Once legend is generated, navigate back to the service's REST endpoint
8. Click 'Export Map' operation at bottom //this should generate blank image
Workaround:
Until a patch is released, following is the work around;
- Use ArcGIS Pro 2.4 and connect to AGS (this version has the ability to connect directly to the ArcGIS Server).
- Import your MXD as a Project and perform sanity checks (such as the symbology has not changed, definition queries, scaling, etc.).
- Publish the service to AGS using ArcGIS Pro 2.4.
- Once published, try to follow the steps listed under "Issue replication". You will not encounter the issue.
Hope this will relieve the grief for most of the friends facing this issue.
Regards.
I'm having this exact issue as itemized in your "steps to replicate" post. Is the only workaround to use your arcgis pro suggestion or is there any other way?
Apology for the late response, but if I am correct, this should have been resolved in the newer version of AGS. Otherwise, this is the only option found for it.
Similar behavior on our system but in our case we found that invalid tokens attached to the Portal web map item for the REST service was causing the GP Print Service to fail when using secured services. Public services printing fine. This was also affecting the display of secured services with other export functionality.
Service information, legend, everything shows up. Health checks fine. No error messages specifically leading to failure diagnosis. But still blank output on the screen randomly and in utilities like the Print service.
Not sure why the invalid tokens are being generated during map publication for use. We are using MXDs and not Pro
Fix that working with Tech support generated for us was to create a new portal web map item and add the service to it as a web item. During that process we manually inputted new credentials for authentication and stored the new token with the item. Update web apps to use the new web maps and print service and everything else displays fine.
Random condition on our ecosystem. Not every publication fails. Verified condition with services published using MXDs from 10.4.1 to 10.7.1. Most recently verified this week with a brand new install of 10.7.1 Desktop published service storing an invalid token in Portal.