Replica Generation Failing During Download

MoreHavoc · ‎11-12-2018

I have an application that generates replicas, allows editing and syncs those edits back to the server. The quantity of data is relatively large in that the initial replica generation is around 1.2 GB. ArcGIS Server (10.6.1) takes about 15 minutes to generate the replica according to the GP logs. After that, it takes more than an hour to download the file. The download time is due to the slow connection (mostly caused by the remote nature of the app).

The problem occurs at about an hour after generation, it appears that ArcGIS Server is deleting the ".geodatabase" file while it is still being downloaded. This causes a "Job error 22 User defined failure." to be present in the logs in runtime. I know this because immediately after the error, the URL that was being access to download the replica starts returning a 404 message and if I browse to the folder on the server, the file is gone.

Does anyone know how to adjust this setting so that the replica can finish downloading?

For reference, this is a UWP app using Runtime 100.4.

PreetiMaske · ‎11-14-2018

By the description of issue it seems like more of some server configuration that geodatabase file gets deleted.

I would recommend checking this link to configure the maximum age of file on server. Please refer to link below

Edit a server directory in Manager—ArcGIS Server Administration (Windows) | ArcGIS Enterprise

View solution in original post

PreetiMaske · ‎11-14-2018

By the description of issue it seems like more of some server configuration that geodatabase file gets deleted.

I would recommend checking this link to configure the maximum age of file on server. Please refer to link below

Edit a server directory in Manager—ArcGIS Server Administration (Windows) | ArcGIS Enterprise

MoreHavoc · ‎11-14-2018

Thanks Preeti, that was the setting I was looking for! This seems to have mitigated the problem for now. The strange thing is that it looks like the default is 10 minutes, but the file was staying much longer than that. And it still strikes me as strange that Server would delete a file that was still being accessed by the client.

Thanks for the help!

JoeHershman · ‎11-15-2018

I use direct rest calls when generating and downloading replicas because of the download time you are experiencing. I am not sure why but the download through the API is absurdly slow compared to the time it actually takes to download the file through normal means

Improve replica transport performance?

You can test by using the Web API. Generate the replica and that will give you the link to the file on the server, then you can manually download through the browser. This download time is what you would see using the rest api directly and my experience is it is orders of magnitude faster.

Thanks,
-Joe

MoreHavoc · ‎11-15-2018

I have been considering doing that as well. I have already had to manually download replicas for users in the field. I normally connect to the tablet in the field via some support software and try to download the replica, after it starts I kill that app and download the replica in a browser and then just copy it to the correctly location and start the app up. That process seems to cut the download time in about half for me (and that's with the bandwidth taken up by the remote session as well it would probably be better if I wasn't connected).

I'm glad to hear that you have had success with this, I think I will change over to calling the endpoint myself as well. I'm also toying with compressing the geodatabase file before transfer, have you tried experimenting with that at all? I see about a 20% size decrease for our production replicas when I zip them.

JoeHershman · ‎11-15-2018

We've done it all

If you are using archive enabled databases, the best approach performance wise follows the idea of zipping. With replicas generated off archive enabled databases you can just generate a single copy of the database and then run GeodatabaseSyncTask.RegisterGeodatabaseAsync (GeodatabaseSyncTask.RegisterGeodatabaseAsync Method) from the client.

Generate the single copy of the .geodatabase and zip and store on the server. Download > Unzip > Register. We have done by storing the zips in Portal and just storing on a server and using HttpClient to download. The portal approach is nice because you can use the API for downloading and not need your own Web Server configuration (we have a custom rest service as an endpoint for that approach). If you are using versioned database this won't work, but I would highly recommend against a versioned database because it does not scale well at all.

We have also worked on breaking out replicas into more manageable sizes, so each only contains a small number of layers. An advantage of this is that if data does need to be replaced on the client it is easier to manage. The current sync tools do not allow for any schema changes. If for any reason something changes on the database side (add a new field) you need to push an entire new replica to the field. Breaking in multiple databases also gives you a way to manage syncing if some data does not change frequently.

Enjoy!

Thanks,
-Joe

MoreHavoc · ‎11-15-2018

Ha!

Seems like we are on similar paths. Schema changes is actually what got me into the current situation I'm in... all because we needed to add a value to a coded value domain, which is a schema change (but that, I think, is for a different thread). I ended up migrating all of the domains into a separate table (instead of using Esri domains) so that I could update them with just a sync instead of a new replica.

I like the idea of breaking it into multiple databases, we do have reference data that we are downloading that doesn't update as often so using that in a separate database makes lots of sense. I have actually been considering using one replica "per layer" where a layer is the main table and its immediate related tables, although this would help a lot with downloading replicas when there are schema changes, it makes querying related records a bit harder.

Have you had any issues with many to many relationships in replicas? I ended up deploying that as two 1 to many relationships.

I have also been considering taking a very different route for syncing. In another application we rolled our own sync engine, and while it was not as comprehensive as Esri's is we were able to handle schema changes and only download those deltas. We were also able to handle partial uploads so if part of a sync worked, we knew what worked and what didn't which helped a lot with spotty connectivity. For this application, I have been considering switching to our own sync engine so that we could do peer-to-peer syncing in the field. For us, the most important thing is that the team in the field can get each other's data, and right now that takes a minimum of 2 syncs per person each evening over what is normally a 2G connection, and ends up being more than that when people don't pay attention or coordinate.

I have another project coming up that is only uploading data and I think we are going to just store that locally and push it as edits to the feature service instead of using a replica.

JoeHershman · ‎11-15-2018

Yes don't get me going on the schema change thing. Another area that the old Windows for Mobile API handled better. We have gone the route of fully breaking out our data into single layer replicas. We also present certain layers in multiple symbology so it is required to separate these out.

We don't use related data in the field so haven't had to address issues with that.

Something else we have done with syncing is just doing delta syncs on data that only needs to be pushed on a nightly basis. The delta files are generated from a python script using the rest API and zipped. Then a rest endpoint is provided for the clients to ping, download, and update using GeodatabaseSyncTask.ImportGeodatabaseDeltaAsync . If not connected it will try the next time the application is opened.

If I could figure out a way to find the edits in the .geodatabase file I would have gone the custom sync engine route long ago. But I have searched through those databases and just cannot figure out how I would query them to find the edits. I know there has to be a way, but no luck so far. All that is done in the core C++ libraries so cannot use reflection to look at the code. We do continuous background syncs to keep the field up to date. If not connected it just skips tries again. If the users needs data faster then the clock they can just do a manual sync (that means multiple people need to coordinate at times). With certain mission critical types of data the field is trained to sync immediately after adding the features. In the old API we could do the background sync on a 5 minute interval and the entire field crew 600+ was current in near real-time and they almost never needed to do a manual sync. No longer with the new sync model, too much of a server hit to sync that much.

I have been frustrated with the new replica sync model since day one. Honestly, think it is a big step backwards from how the old API worked and performance stinks.

Thanks,
-Joe