I have a problem that occurs when you try to publish a hosted feature (copy data) to Portal from ArcGIS Pro.
The service definition-file uploads to Portal just fine, and for a short period of time, a hosted feature with the same name is created in the Portal but then Pro goes "Failed to publish web layer" and the hosted feature is removed from Portal.
The Pro-log reads:
|...more successful operations above here, cut out...|
|2021-01-28 21:24||Status: InProgress||StatusMessage: Compressing package into SD file|
|2021-01-28 21:24||Status: InProgress||StatusMessage: Staging successful|
|2021-01-28 21:24||Status: InProgress||StatusMessage: Uploading service definition|
|2021-01-28 21:24||Status: InProgress||StatusMessage: Publishing tool initialized|
|2021-01-28 21:24||Status: InProgress||StatusMessage: Publishing web layer (AGO)|
|2021-01-28 21:24||Status: InProgress||StatusMessage: Failed. Failed to execute (Publish Portal Service). Failed.|
|2021-01-28 21:24||Status: InProgress||StatusMessage: Publishing web layer failed (AGO)|
|2021-01-28 21:24||Status: InProgress||StatusMessage: Publishing tool execution failed|
|2021-01-28 21:24||Status: Failed||ErrorMessage: Failed to publish web layer|
The log from Server Manager reads:
|SEVERE||28 jan. 2021 21:24:42||Error executing tool. Publish Portal Service Job ID: j7f2ec07d2c654eeb81499fbc6ce749f3 : Failed. Failed to execute (Publish Portal Service).||System/PublishingTools.GPServer|
|SEVERE||28 jan. 2021 21:24:42||Delegate job failed.||System/PublishingTools.GPServer|
|SEVERE||28 jan. 2021 21:24:42||The containing process for 'System/PublishingToolsEx' job 'j51c7db29ac9d4661bc425aa5d5d3b925' has crashed.||Server|
|SEVERE||28 jan. 2021 21:24:36||Instance of the service 'System/PublishingToolsEx.GPServer' crashed. Please see if an error report was generated in 'C:\arcgisserver\logs\<domain>\errorreports'. To send an error report to Esri, compose an e-mail to ArcGISErrorReport@esri.com and attach the error report file.||Server|
Local server logs:
"Source: .NET Runtime"
Application: ArcSOC.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: exception code c0000005, exception address 00007FFDA8088947
"Source: Application Error"
Faulting application name: ArcSOC.exe, version: 126.96.36.19934, time stamp: 0x5ee81bca Faulting module name: sdepgsrvr.dll, version: 188.8.131.5234, time stamp: 0x5ee80ef4 Exception code: 0xc0000005 Fault offset: 0x0000000000068947 Faulting process id: 0x858 Faulting application start time: 0x01d6f58f52f28ee3 Faulting application path: C:\Program Files\ArcGIS\Server\framework\runtime\ArcGIS\bin\ArcSOC.exe Faulting module path: C:\Program Files\ArcGIS\Server\framework\runtime\ArcGIS\bin\sdepgsrvr.dll Report Id: d5af886f-61a6-11eb-8161-0050568c55d9 Faulting package full name: Faulting package-relative application ID:
The setup is a two machine Enterprise site with one server running ArcGIS Server and DataStore and the other is running Portal. Windows Server 2012 r2 on both.
After reviewing logs it seems as this error has occured from December 5, 2020 (the ArcSOC-crash that is) but not at this scale at all and from January 18th 2021 we have not been able to publish like this at all through Pro.
What we've done:
... and a bunch of other stuff.
I've probably left out lots of important stuff but my brain is aching, and also, sorry for the formatting os some stuff.
Anyone have ANY idea of what can cause this behaviour? ESRI Sweden "gave up" and suggested we order a new machine with Windows Server 2019 on and reinstall. But I would like to solve this :)
ANY help or suggestions at all is very welcome at this stage! Thanks!
Solved! Go to Solution.
I'd thought I should give an update on this since we've done and discovered some new things that might be of interest to others who find themselves in similar situations. Especially since these things seems to have "cured" the site! :)
First of all, we had no real way to go, we thought we had done "all" that we could. Therefor we installed Portal, Server & Data Store on two new virtual machines so we could get a completely fresh start. When that was done, the idea was to use a backup by WebGIS DR from the old, problematic servers to fill the new site and just go from there. But at this stage we discovered that the scheduled backups (WebGIS DR) that had been running weekly on the old site were not complete, the backup files did in fact not contain the data from Data Store (0kb) :grimacing_face:
This caused some panic as we now were on an unstable system without complete backups. The last full backup turned out to be from November 2020. (This can be a general word of caution that you don't get an error message if the backups are incomplete, so you'll have to check this yourselves, somehow). However, this pointed us in the direction that something was wrong within the Data Store. We had some thoughts on this earlier on in the troubleshooting but according to ESRI the Data Store and the postgreSQL-database connected to it "should not be touched" and it's really hard to find any info on it at all.
Since first priority now was to get a working backup of the datastore we and a technician from ESRI now focused our efforts on that. We tried both the ArcGIS tier backupdatastore and the postgreSQL tier pg_dump command but both failed the same way with error message:
pg_dump: error: query returned 0 rows instead of one: SELECT typlen, typinput, typoutput, typreceive, typsend, typmodin, typmodout, typanalyze, typreceive::pg_catalog.oid AS typreceiveoid, typsend::pg_catalog.oid AS typsendoid, typmodin::pg_catalog.oid AS typmodinoid, typmodout::pg_catalog.oid AS typmodoutoid, typanalyze::pg_catalog.oid AS typanalyzeoid, typcategory, typispreferred, typdelim, typbyval, typalign, typstorage, (typcollation <> 0) AS typcollatable, pg_catalog.pg_get_expr(typdefaultbin, 0) AS typdefaultbin, typdefault FROM pg_catalog.pg_type WHERE oid = '1889862'::pg_catalog.oid
We took a file copy of the failing Data Store to the new system and connected it to the site to troubleshoot it there instead of live and after some initial tests we tried the vacuum-command. And... this actually seems to have done the trick! Which is also weird in its own way since vacuum is, and has been, performed on a daily basis on the postgreSQL-database already according to the logs. But to do it manually really seems to have healed the Data Store. We tried to do a backup of the Data Store and that now worked, so we decided to perform the same steps in the production environment late last night (after snapshots & file backups were taken). And so far everything seems to be working great! Backups are taked through WebGIS DR and things are running fine :) A bonus to all of this is that the original problem with the publishing of hosted features-errors now seems to have vanished as well..
We still don't know what caused the issue in the first place though which is a bit of a downer. But one hypothesis is that somewhere in November the postgreSQL-database, for whatever reason, crashed and stopped right in the middle of something which then never really got done and caused some error in the DB. This might also explain the erratic behaviour that sometimes it was possible to publish hosted layers but most of the times it gave errors, that when it worked, the faulty lines/parts of the DB were not involved.. But this is just a guess of course. I don't know very much of postgreSQL-databases after all, and one should not need to know much either since ESRI claims that you never should have to touch them. But that was in the end what actually seems to have solved our issues.
We are letting the users test all the operations they'd expect to be working today and to report if they find some suspect behaviour, but so far (~4 hours into the day) everything seems to be working as is should once again! 8)
Thanks to @HenryLindemann for general ideas and thoughts in the thread.
On what version of Windows is ArcGIS Pro running and what version is it?
In you testing did you try and publish from a SQL connected service and a FGDB?
Have you tried updating .net and c++ redistributables?
What ArcObjects is specified in the PubEx service
Thanks for answering @HenryLindemann!
Version 2.7.0 of Pro right now, have also tried 2.6.x before with no success. Running on Windows 10 Enterprise 20H2 version 19042.746. (Have also tried to publish from a Pro 2.7 installation on the server Windows Server 2012 r2).
All the failed tests have been from a file geodatabase feature class (that's what's not working, but I probably forgot to mention that in the main post).
To publish to Portal from a SQL-connected database generates the same errors.
To upload a FGDB to Portal and to publish services from there works.
Have not updated .net and c++ redistributables. I've looked suspiciously at .net but have not fiddled with it, yet. Will do now then :) Will also look at the c++, thanks! Any suggestions for versions to look at when updating?
Could this have to do with anything?
I would just install the latest The latest supported Visual C++ downloads (microsoft.com)
I know there was a .net update for pro at 2.6 I think.
I see there is two versions of sdepgsrvr.dll can you have a look if it is the same as what I have below
I have included the two DLL in the download, might just be a corrupt dll
Is it possible to just replace the .dll-files? There's nothing "special" written in them as far as you know? The versions were the same as yours btw, but might still be corrupt as you say.
Will give this a shot, along with the .net and c++.
If it is the same version then I don't see a problem, it is jut a peace of wrapped code after all. Just turn you server of when you replace it as there might be some service accessing the dll.
Ok let me know if any of the suggestions solves it.
Thought I might give an update on the issue.
Installing .NET, C++ and change of .dll-files did unfortunatley not solve the publishing error so we went back to the snapshots taken befor tinkering with this.
Next step is to try an "in-place upgrade" of the OS to Windows Server 2019 on the two machines. We'll do this by cloning and test to run the OS-upgrade in the clones. This "in-place upgrade" has been suggested by ESRI and they say they've done this successfully many times, our IT-provider is not that confident to say the least but let's hope that it works out :)
If that doesn't work the next step will probably be to set up two new machines running Windows Server 2019 and then migrate Enterprise from the old ones. Something like this: https://www.esri.com/arcgis-blog/products/arcgis-enterprise/administration/migrate-to-a-new-machine-...
Last step would be to reinstall Enterprise on two new machines and then try to migrate data etc.
But we'll see. I'll post the final "solution" here, once we're there...
Have now performed an in-place upgrade of the OS to Server 2019, which went fine in itself but didn't solve the actual problem. We'll keep running on 2019 since it at least didn't seem to break anything else :)
Other things we've done:
Things we'll check today, or tomorrow is that when upgrading to 10.8.1 the ESRI-technician added a web context url for some reason.. Can't remember why now so well have to check that again. And we'll also check the reverse proxy setups we have, again.
But all input is more than welcome so if you have something that you think might be worth checking, please leave a comment! :)
On your question I think yes, the PublishingToolsEx tool spins up when publishing a GDB, it is not tied to the ArcGIS for Desktop app , I think the PublishingToolsEx is more a legacy component.
Have you tried to spin up a new vm with a ArcGIS server that is federated and publish to that server? if this works then the corrupted part is definitely with ArcGIS server if it fails then it might be portal.
So usually when I uninstall and re-install and the error persists the corruption is either in program files or in the content.
My last resort method is as follows, backup the ArcGIS server and program files dir, uninstall and delete the program files ArcGIS C:\Program Files\ArcGIS\Server , this breaks the link to your existing system but we can recover that.
Rename the arcgisserver dir e.g. arcgisserver_old "the one with the config-store"
Re-Install ArcGIS Server to the same dir.
Now we want to rebuild the links to portal and any data dirs.
go into the backup and copy any folders that are missing in the new system. "only Folders".
Go trough the folders one for one .
the folders are
C:\arcgis\arcgisserver\config-store\security "*special notes see below"
C:\arcgis\arcgisserver\config-store\services "*special notes wait till after fist test"
do not copy over system and utilities form the backup this is internal system folders.
C:\arcgis\arcgisserver\directories\arcgiscache "*special notes wait till after fist test"
C:\arcgis\arcgisserver\directories\arcgisjobs "*special notes wait till after fist test"
C:\arcgis\arcgisserver\directories\arcgisoutput "*special notes wait till after fist test"
C:\arcgis\arcgisserver\directories\arcgissystem\arcgisinput "*special notes wait till after fist test"
drill down and copy folders in the above dirs
only copy security-config.json this is the federation and replace the existing file.
do not copy over system and utilities form the backup this is internal system folders.
NB Re-apply the service accounts to the ArcGIS server after the copy process .
if everything went well you will have a empty ArcGIS Server that is federated to you system with the existing federation, now do you publishing test on the empty system.
Now I have seen on upgraded systems that misplaced files can land in the content store and cause funny errors that is why we do the first test with no content. if this works then copy the content back and test again if it fails then it is a indication that something is misplaced in the content and if it works great.
Hope it Helps
Thank you for your answer, once again.
Actually, we've already tried to both use a new machine with a new site and we've also reinstalled ArcGIS Server pretty much to the point of your last resort option, I did it together with a guy from ESRI but we got the same error once more.
The next steps for us will be to step by step together with our IT-provider walk through the system/windows/network settings on the servers and in the firewalls, reverse proxys etc to see if there's something there that has changed or somehow is causing the error. I have a list of stuff that needs to be checked, mostly taken from the system requirements and from the installation documents of Enterprise.
Will let you know when we've found something!