GeoproccesingTools.Ex / ArcSOC.exe crashes when uploading a hosted feature layer from ArcGIS Pro to Enterprise 10.8.1

1549
10
Jump to solution
01-28-2021 01:29 PM
MarcusAndersson
New Contributor II

Hi guys,

I have a problem that occurs when you try to publish a hosted feature (copy data) to Portal from ArcGIS Pro.

The service definition-file uploads to Portal just fine, and for a short period of time, a hosted feature with the same name is created in the Portal but then Pro goes "Failed to publish web layer" and the hosted feature is removed from Portal.

MarcusAndersson_0-1611865558896.png

The Pro-log reads:

  ...more successful operations above here, cut out...
2021-01-28 21:24Status: InProgressStatusMessage: Compressing package into SD file
2021-01-28 21:24Status: InProgressStatusMessage: Staging successful
2021-01-28 21:24Status: InProgressStatusMessage: Uploading service definition
2021-01-28 21:24Status: InProgressStatusMessage: Publishing tool initialized
2021-01-28 21:24Status: InProgressStatusMessage: Publishing web layer (AGO)
2021-01-28 21:24Status: InProgressStatusMessage: Failed. Failed to execute (Publish Portal Service). Failed.
2021-01-28 21:24Status: InProgressStatusMessage: Publishing web layer failed (AGO)
2021-01-28 21:24Status: InProgressStatusMessage: Server Response: {"hasVersionedData":false,"supportsDisconnectedEditing":false,"supportedQueryFormats":"JSON","currentVersion":10.81,"serviceDescription":"","maxRecordCount":2000,"capabilities":"Query","description":"","copyrightText":"","spatialReference":{"wkid":3006,"latestWkid":3006},"fullExtent":{"xmin":438713.39250000007,"ymin":6393012.90599999949,"xmax":527327.68699999992,"ymax":6533343.0839000009,"spatialReference":{"wkid":3006,"latestWkid":3006}},"initialExtent":{"xmin":364851.86917459668,"ymin":6354372.8373888936,"xmax":596411.18957539252,"ymax":6583317.30975505,"spatialReference":{"wkid":3006,"latestWkid":3006}},"units":"esriMeters","allowGeometryUpdates":true,"enableZDefaults":true,"zDefault":0,"syncEnabled":false,"supportsApplyEditsWithGlobalIds":false,"maxViewsCount":20,"allowUpdateWithoutMValues":true,"editorTrackingInfo":{"enableEditorTracking":false,"enableOwnershipAccessControl":false,"allowOthersToUpdate":true,"allowOthersToDelete":false,"allowOthersToQuery":true},"supportsReturnDeleteResults":true,"isLocationTrackingService":false,"hasSyncEnabledViews":false,"hasViews":false,"supportsAppend":true,"supportedAppendFormats":"shapefile,featureCollection","layers":[{"id":0,"name":"test_m2"}],"tables":[],"serviceItemId":"c283eb374db942a49595a6341d997741"}
2021-01-28 21:24Status: InProgressStatusMessage: Publishing tool execution failed
2021-01-28 21:24Status: FailedErrorMessage: Failed to publish web layer

 

The log from Server Manager reads:

SEVERE28 jan. 2021 21:24:42Error executing tool. Publish Portal Service Job ID: j7f2ec07d2c654eeb81499fbc6ce749f3 : Failed. Failed to execute (Publish Portal Service).System/PublishingTools.GPServer
SEVERE28 jan. 2021 21:24:42Delegate job failed.System/PublishingTools.GPServer
SEVERE28 jan. 2021 21:24:42The containing process for 'System/PublishingToolsEx' job 'j51c7db29ac9d4661bc425aa5d5d3b925' has crashed.Server
SEVERE28 jan. 2021 21:24:36Instance of the service 'System/PublishingToolsEx.GPServer' crashed. Please see if an error report was generated in 'C:\arcgisserver\logs\<domain>\errorreports'. To send an error report to Esri, compose an e-mail to ArcGISErrorReport@esri.com and attach the error report file.Server

 

Local server logs:
"Source: .NET Runtime"

 

Application: ArcSOC.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: exception code c0000005, exception address 00007FFDA8088947

 

"Source: Application Error"

 

Faulting application name: ArcSOC.exe, version: 12.6.0.24234, time stamp: 0x5ee81bca
Faulting module name: sdepgsrvr.dll, version: 12.6.0.24234, time stamp: 0x5ee80ef4
Exception code: 0xc0000005
Fault offset: 0x0000000000068947
Faulting process id: 0x858
Faulting application start time: 0x01d6f58f52f28ee3
Faulting application path: C:\Program Files\ArcGIS\Server\framework\runtime\ArcGIS\bin\ArcSOC.exe
Faulting module path: C:\Program Files\ArcGIS\Server\framework\runtime\ArcGIS\bin\sdepgsrvr.dll
Report Id: d5af886f-61a6-11eb-8161-0050568c55d9
Faulting package full name: 
Faulting package-relative application ID: 

 


The setup is a two machine Enterprise site with one server running ArcGIS Server and DataStore and the other is running Portal. Windows Server 2012 r2 on both.

What works:

  • To publish a hosted feature service from ArcMap 10.4 works just fine.
  • To publish a Map Image works just fine.
  • To upload a *.zip-shapefile through the web interface of Portal and publish it as a hosted service works just fine, albeit an error shows but the service itself works fine.
  • To publish a hosted feature service through the Portal web interface (from the *.sd-file previously uploaded from Pro!) works, though the same error is shown in the browser
    3.PNG

After reviewing logs it seems as this error has occured from December 5, 2020 (the ArcSOC-crash that is) but not at this scale at all and from January 18th 2021 we have not been able to publish like this at all through Pro. 

What we've done:

  • Feels like "everything" by now... But I'll recap what I can remember right now (late at night in Sweden)
  • Restarted everything (processes, servers etc) a couple of times
  • Removed/reinstalled Windows Updates from the server machines.
  • Installed the IIS stability patch
  • Removed and reinstalled the "Printing patch"
  • Reviewed and opened ports if needed according to installation documents
  • Checked traffic using Fiddler with no significant results
  • Checked traffic on reverse proxys with no significant results
  • Physically went to the office to check if that would make a difference, beeing on that network. (We use Direct Access from home)
  • Checked that we have enough disk space everywhere
  • Recreated PublishingTools and .EX

... and a bunch of other stuff.

I've probably left out lots of important stuff but my brain is aching, and also, sorry for the formatting os some stuff. 

Anyone have ANY idea of what can cause this behaviour? ESRI Sweden "gave up" and suggested we order a new machine with Windows Server 2019 on and reinstall. But I would like to solve this :)
ANY help or suggestions at all is very welcome at this stage! Thanks!

/ Marcus

0 Kudos
1 Solution

Accepted Solutions
MarcusAndersson
New Contributor II

Update!

I'd thought I should give an update on this since we've done and discovered some new things that might be of interest to others who find themselves in similar situations. Especially since these things seems to have "cured" the site! :)

First of all, we had no real way to go, we thought we had done "all" that we could. Therefor we installed Portal, Server & Data Store on two new virtual machines so we could get a completely fresh start. When that was done, the idea was to use a backup by WebGIS DR from the old, problematic servers to fill the new site and just go from there. But at this stage we discovered that the scheduled backups (WebGIS DR) that had been running weekly on the old site were not complete, the backup files did in fact not contain the data from Data Store (0kb) :grimacing_face: 
This caused some panic as we now were on an unstable system without complete backups. The last full backup turned out to be from November 2020. (This can be a general word of caution that you don't get an error message if the backups are incomplete, so you'll have to check this yourselves, somehow). However, this pointed us in the direction that something was wrong within the Data Store. We had some thoughts on this earlier on in the troubleshooting but according to ESRI the Data Store and the postgreSQL-database connected to it "should not be touched" and it's really hard to find any info on it at all.
Since first priority now was to get a working backup of the datastore we and a technician from ESRI now focused our efforts on that. We tried both the ArcGIS tier backupdatastore and the postgreSQL tier pg_dump command but both failed the same way with error message:

pg_dump: error: query returned 0 rows instead of one: SELECT typlen, typinput, typoutput, typreceive, typsend, typmodin, typmodout, typanalyze, typreceive::pg_catalog.oid AS typreceiveoid, typsend::pg_catalog.oid AS typsendoid, typmodin::pg_catalog.oid AS typmodinoid, typmodout::pg_catalog.oid AS typmodoutoid, typanalyze::pg_catalog.oid AS typanalyzeoid, typcategory, typispreferred, typdelim, typbyval, typalign, typstorage, (typcollation <> 0) AS typcollatable, pg_catalog.pg_get_expr(typdefaultbin, 0) AS typdefaultbin, typdefault FROM pg_catalog.pg_type WHERE oid = '1889862'::pg_catalog.oid

We took a file copy of the failing Data Store to the new system and connected it to the site to troubleshoot it there instead of live and after some initial tests we tried the vacuum-command. And... this actually seems to have done the trick! Which is also weird in its own way since vacuum is, and has been, performed on a daily basis on the postgreSQL-database already according to the logs. But to do it manually really seems to have healed the Data Store. We tried to do a backup of the Data Store and that now worked, so we decided to perform the same steps in the production environment late last night (after snapshots & file backups were taken). And so far everything seems to be working great! Backups are taked through WebGIS DR and things are running fine :) A bonus to all of this is that the original problem with the publishing of hosted features-errors now seems to have vanished as well.. 

We still don't know what caused the issue in the first place though which is a bit of a downer. But one hypothesis is that somewhere in November the postgreSQL-database, for whatever reason, crashed and stopped right in the middle of something which then never really got done and caused some error in the DB. This might also explain the erratic behaviour that sometimes it was possible to publish hosted layers but most of the times it gave errors, that when it worked, the faulty lines/parts of the DB were not involved.. But this is just a guess of course. I don't know very much of postgreSQL-databases after all, and one should not need to know much either since ESRI claims that you never should have to touch them. But that was in the end what actually seems to have solved our issues.

We are letting the users test all the operations they'd expect to be working today and to report if they find some suspect behaviour, but so far (~4 hours into the day) everything seems to be working as is should once again! 8)

Thanks to @HenryLindemann for general ideas and thoughts in the thread.

BR,
Marcus

View solution in original post

10 Replies
HenryLindemann
Esri Contributor

@MarcusAndersson,

On what version of Windows is ArcGIS Pro running and what version is it?

In you testing did you try and publish from a SQL connected service and a FGDB? 

Have you tried updating .net and c++ redistributables?

What ArcObjects is specified in the PubEx service

HenryLindemann_0-1611903603320.png

Regards

Henry

MarcusAndersson
New Contributor II

Hi!

Thanks for answering @HenryLindemann!

MarcusAndersson_0-1611910881788.png

Version 2.7.0 of Pro right now, have also tried 2.6.x before with no success. Running on Windows 10 Enterprise 20H2 version 19042.746. (Have also tried to publish from a Pro 2.7 installation on the server Windows Server 2012 r2).

All the failed tests have been from a file geodatabase feature class (that's what's not working, but I probably forgot to mention that in the main post).
To publish to Portal from a SQL-connected database generates the same errors.
To upload a FGDB to Portal and to publish services from there works.

Have not updated .net and c++ redistributables. I've looked suspiciously at .net but have not fiddled with it, yet. Will do now then :) Will also look at the c++, thanks! Any suggestions for versions to look at when updating?

MarcusAndersson_0-1611907370673.png

ArcObjects11.

Thanks again!
Best regards,
Marcus

--------------------------------------------------------
Added:
Could this have to do with anything?

MarcusAndersson_0-1611908425019.png

BR,
Marcus

0 Kudos
HenryLindemann
Esri Contributor

Hi @MarcusAndersson,

I would just install the latest The latest supported Visual C++ downloads (microsoft.com)

https://dotnet.microsoft.com/download/dotnet-framework/thank-you/net48-web-installer

I know there was a .net update for pro at 2.6 I think.

I see there is two versions of sdepgsrvr.dll can you have a look if it is the same as what I have below  

HenryLindemann_0-1611912313663.png

I have included the two DLL in the download, might just be a corrupt dll

MarcusAndersson
New Contributor II

Great, thanks!

Is it possible to just replace the .dll-files? There's nothing "special" written in them as far as you know? The versions were the same as yours btw, but might still be corrupt as you say.

Will give this a shot, along with the .net and c++.  

Thanks,

/ Marcus

0 Kudos
HenryLindemann
Esri Contributor

If it is the same version then I don't see a problem, it is jut a peace of wrapped code after all. Just turn you server of when you replace it as there might be some service accessing the dll.

Ok let me know if any of the suggestions solves it.

Henry

MarcusAndersson
New Contributor II

Hi again,

Thought I might give an update on the issue. 
Installing .NET, C++ and change of .dll-files did unfortunatley not solve the publishing error so we went back to the snapshots taken befor tinkering with this.

Next step is to try an "in-place upgrade" of the OS to Windows Server 2019 on the two machines. We'll do this by cloning and test to run the OS-upgrade in the clones. This "in-place upgrade" has been suggested by ESRI and they say they've done this successfully many times, our IT-provider is not that confident to say the least but let's hope that it works out :)

If that doesn't work the next step will probably be to set up two new machines running Windows Server 2019 and then migrate Enterprise from the old ones. Something like this: https://www.esri.com/arcgis-blog/products/arcgis-enterprise/administration/migrate-to-a-new-machine-...

Last step would be to reinstall Enterprise on two new machines and then try to migrate data etc.
But we'll see. I'll post the final "solution" here, once we're there...

0 Kudos
MarcusAndersson
New Contributor II

Update:

Have now performed an in-place upgrade of the OS to Server 2019, which went fine in itself but didn't solve the actual problem. We'll keep running on 2019 since it at least didn't seem to break anything else :)

Other things we've done:

  • Created a new site from scratch
  • De- and reinstalled ArcGIS Server
  • Checked for permissions/rights (created a new built in user, not connected to the AD).. Do feel that this path might be worth investigating a bit further though. Does anyone ( @HenryLindemann ?) know if the AppPool or "anyone else" is involved when PublishingToolsEx runs from Pro?
  • Lots of other things but these are the major ones I think..

Things we'll check today, or tomorrow is that when upgrading to 10.8.1 the ESRI-technician added a web context url for some reason.. Can't remember why now so well have to check that again. And we'll also check the reverse proxy setups we have, again.

But all input is more than welcome so if you have something that you think might be worth checking, please leave a comment! :)

/ Marcus

0 Kudos
HenryLindemann
Esri Contributor

Moring @MarcusAndersson,

On your question I think yes, the PublishingToolsEx tool spins up when publishing a GDB, it is not tied to the ArcGIS for Desktop app , I think the PublishingToolsEx is more a legacy component.

Have you tried to spin up a new vm with a ArcGIS server that is federated and publish to that server? if this works then the corrupted part is definitely with ArcGIS server if it fails then it might be portal. 

So usually when I uninstall and re-install and the error persists the corruption is either in program files or in the content.

My last resort method is as follows, backup the ArcGIS server and program files dir, uninstall and delete the program files ArcGIS C:\Program Files\ArcGIS\Server , this breaks the link to your existing system but we can recover that.

1.

Rename the arcgisserver dir e.g. arcgisserver_old "the one with the config-store"

Re-Install ArcGIS Server to the same dir.

2.

Now we want to rebuild the links to portal and any data dirs.

go into the backup and copy any folders that are missing in the new system. "only Folders".

Go trough the folders one for one .

the folders are 

config-store

C:\arcgis\arcgisserver\config-store\data\{sub-dirs}\

C:\arcgis\arcgisserver\config-store\security "*special notes see below"

C:\arcgis\arcgisserver\config-store\services "*special notes wait till after  fist test"

 

directories

do not copy over system and utilities form the backup this is internal system folders.

C:\arcgis\arcgisserver\directories\arcgiscache "*special notes wait till after  fist test"

C:\arcgis\arcgisserver\directories\arcgisjobs "*special notes wait till after  fist test"

C:\arcgis\arcgisserver\directories\arcgisoutput "*special notes wait till after  fist test"

C:\arcgis\arcgisserver\directories\arcgissystem\arcgisinput  "*special notes wait till after  fist test"

 

config-store\data

HenryLindemann_0-1613110608566.png

drill down and copy folders in the above dirs 

HenryLindemann_1-1613110716796.png

config-store\security

only copy security-config.json this is the federation and replace the existing file.

HenryLindemann_2-1613111143282.png

 

config-store\services

do not copy over system and utilities form the backup this is internal system folders.

NB Re-apply the service accounts to the ArcGIS server after the copy process .

First Test

if everything went well you will have a empty ArcGIS Server that is federated to you system with the existing federation, now do you publishing test on the empty system.

Now I have seen on upgraded systems that misplaced files can land in the content store and cause funny errors that is why we do the first test with no content. if this works then copy the content back and test again if it fails then it is a indication that something is misplaced in the content and if it  works great.

 

Hope it Helps 

Henry

 

 

 

MarcusAndersson
New Contributor II

Hi @HenryLindemann!

Thank you for your answer, once again.

Actually, we've already tried to both use a new machine with a new site and we've also reinstalled ArcGIS Server pretty much to the point of your last resort option, I did it together with a guy from ESRI but we got the same error once more. 

The next steps for us will be to step by step together with our IT-provider walk through the system/windows/network settings on the servers and in the firewalls, reverse proxys etc to see if there's something there that has changed or somehow is causing the error. I have a list of stuff that needs to be checked, mostly taken from the system requirements and from the installation documents of Enterprise.

Will let you know when we've found something!

Best regards,
Marcus

0 Kudos