I have a script that runs and will publish and replace vector tile layers in portal. We have many vector tile layers that have to be updated on a regular basis. Every time I run this script, it works for most layers (~30 layers) but 2-3 random layers (it's different every time) end up failing on the replace stage with an error "Failed to replace service". If I go into Portal manually and try to replace the vector tile layers with the updated layer I just published to portal, I get an "undefined" warning and the original vector tile layer doesn't add to a map anymore. The only work around I have found is to delete the original vector tile pack and publish a new one with the same name. However, this requires me to update the script with the new item id and is not ideal. Any idea as to why vector tile layers will randomly fail to be replaced and will become undefined?
Sorry to only give you a "me too" on this one. You described exactly the same problem I have.
The "replace" operation is intermittent and works more than 3/4 of the time for me. "Overwrite" on map image layers fails probably about 50% of the time for me.
I got so tired of having to fix maps manually so I am writing a script that fixes them. I am in the testing stage now. It goes through every map (44 right now) in our portal and edits them.
We are also experiencing this issue on several ArcGIS Enterprise sites with 10.6.1, 10.8.1, 10.9.0, 10.9.1 and 11.0. We update several Vector Tiles services every hour using a script. There are periods (e. g. in the third and fourth quarter in 2022) where approx. 1 out of 100 replaces fails. Since the beginning of this year, around 1 out of 30 replaces fails although we didn't change the system.
This is the message on the client side:
Python Exception <Exception>: Replace service error: Failed to replace service 'Hosted/XXX.VectorTileServer' for 'Hosted/XXX_20230103_113507.VectorTileServer'.
This is the message on the server side:
<Msg time="2023-01-04T09:03:05,973" type="SEVERE" code="7287" source="Admin" process="5416" thread="1" methodName="" machine="YYY.LU" user="" elapsed="" requestID="172fb9a4-22dd-4c0a-ad68-9cf8c6bd16bf">Failed to rename service 'Hosted/XXX_20230103_113507.VectorTileServer'.</Msg>
<Msg time="2023-01-04T09:03:05,973" type="SEVERE" code="7399" source="Admin" process="5416" thread="1" methodName="" machine="YYY.LU" user="" elapsed="" requestID="172fb9a4-22dd-4c0a-ad68-9cf8c6bd16bf">Failed to replace service 'Hosted/XXX.VectorTileServer' for 'Hosted/XXX_20230103_113507.VectorTileServer'.</Msg>
When the exception is thrown, two important directories have been deleted are aren't available anymore for ArcGIS Server:
So essentially the cache (with its root.json) as well as its metadata from the config store cannot be requested when the client makes a request to the service. As the service is removed from the config store, it is also missing in the ArcGIS Server Admin interface making it impossible to replace the service with subsequent script calls.
On the other side, the other directories belonging to the replace operation still exist on the file system. I guess these are the original folders that have been renamed and still exist physically:
Beside this, the newly published services that should replace the original contents still exist at the moment when the replace fails:
All these directories are located on a file cluster. We only have Windows machines and the language is set to Luxembourgish.
The only possibility to keep the service with its unique identifier is to recover the two first mentioned directories, for example from a backup. You can also finish what ArcGIS Server has begun and rename the two XXX_20230103_113507 directories. But remember to rename also the XXX_20230103_113507.VectorTileServer.json file and its content. In the JSON file, You also have to insert the correct Portal item id.
If You are not able to rescue the service manually and finish the replace operation, the service has to be republished and You will receive a new Portal item id meaning that You also have to adapt all Your WebMaps, scripts etc.
I guess that there are locks keeping ArcGIS Server from renaming the directories. Is this a known issue to Esri?
I recently came across this same issue when updating a vector tile layer (ArcGIS Enterprise 10.9.1). Thankfully I was able to fix the service by renaming the related files and folders in the config-store and arcgiscache folders. I'm used to having to try a few times before the update is successful, but this is the first time that the layer broke during the update process and I had to go into the ArcGIS Server directories to fix it manually.
I'm also experiencing exactly same issues. My success % are very low though like (10%).
I tried two options:
I don't know why but it seems I can't stop service before replacing it. Maybe because its Vector Tile Service?
Any help on this will be greatly appreciated.
My Best,
Meghan Kulkarni
All ways have the same problem that will appear sooner or later.
We first used the ReplaceWebLayer_server from arcpy:
print("%s Publishing Vector Tile Package as Hosted Tile Layer..." % get_now())
out_results, package_item_id, publish_results = arcpy.management.SharePackage(cache_file, self.portal_user, self.portal_password, summary, tags, credits, "MYGROUPS", None, "MYORGANIZATION", "TRUE", self.portal_folder)
print("package_item_id: " + package_item_id)
replacement_web_layer_serviceItemId = json.loads(package_item_id)["publishResult"]["serviceItemId"]
replacement_web_layer_serviceurl = json.loads(package_item_id)["publishResult"]["serviceurl"]
print("%s Replacing Prod Hosted Tile Layer with published Hosted Tile Layer..." % get_now())
archive_layer_name = tile_layer_title + "_" + now + "_archive"
updated_target_layer = arcpy.ReplaceWebLayer_server(target_layer=tile_layer_id, archive_layer_name=archive_layer_name, update_layer=replacement_web_layer_serviceItemId, replace_item_info="REPLACE", create_new_item=False)
print("%s Deleting replacement Vector Tile Package..." % get_now())
gis.content.get(publish_results).delete()
print("%s Deleting replacement Hosted Tile Layer..." % get_now())
gis.content.get(replacement_web_layer_serviceItemId).delete()
And then switched entirely to replace_service from ArcGIS API for Python as the code seemed more straightforward:
print("%s Uploading Vector Tile Package..." % get_now())
vtpk_package_item = gis.content.add(item_properties = {
"type": "Vector Tile Package",
"tags": tags,
"description": summary,
"licenseInfo": credits
},
data = cache_file,
folder = self.portal_folder,
owner = self.portal_user)
print("%s Publishing Vector Tile Package as Hosted Tile Layer..." % get_now())
vtpk_layer_item = vtpk_package_item.publish()
print("%s Replacing Prod Hosted Tile Layer with published Hosted Tile Layer..." % get_now())
gis.content.replace_service(replace_item=tile_layer_id, new_item=vtpk_layer_item, replace_metadata=False)
print("%s Deleting replacement Vector Tile Package..." % get_now())
vtpk_package_item.delete()
print("%s Deleting replacement Hosted Tile Layer..." % get_now())
vtpk_layer_item.delete()
You said, You tried replaceService from ArcGIS REST APIs.
To my knowledge there is no other option left. In my opinion, its a server-side bug.
I've got no solution to offer, but just confirming the same issue/bug with Portal 10.9.1.
It's very frustrating and I've spent hours trying to identify the reason why at least one (or more) of my 25 vector tile services have been failing to update each time I run my update script.
@StefanUseldinger your detailed analysis and last line in a post above "I guess that there are locks keeping ArcGIS Server from renaming the directories. Is this a known issue to Esri?" just turned on a light bulb for me.
If a file/folder lock is the intermittent issue that is causing the replace_service function to fall over, then perhaps adding a delay before and/or after calling the function can give the server enough time to release the lock?
So I tried adding a time.sleep(5) before and after the replace_service call, e.g.
time.sleep(5)
replace = gis.content.replace_service(replace_item=item_id, new_item=archive_id)
time.sleep(5)
I also added these wait states everywhere, without luck.
Where is Your Server config store? Is it stored on a VM, a network file share,...? Do You have multiple Server machines?
AWS server infrastructure, 2x ArcGIS Servers federated.