We have a utility that generate the initial replica for a pre-planned workflow. With large datasets (like almost 1 GB in size) the initial download is pretty slow even when generating on the server the hosts the services. We do see some improvements on the server cutting the time down from 29 minutes to 24 minutes. However, of this time the majority is transport between AGS and the download location. Monitoring the folder on server showed it took 9 minutes to create the replica. That would mean it took 15 minutes to transport the file when running on the same server and drive that the replica is generated. This seems to me like a long time to basically copy the file from the server location to the download location.
I realize that even on the same server it is still transporting over https, but is there anyway to improve this transport?
Transferring of a 1Gb of data is going to depend on your network connection. It might be worth considering pre-generating a replica for all apps and pre-deploy and register instead.
However the intended approach is to create a replicate that represents the subset of data you'll be working with (ie if you're going to a certain area that day to work, you'll only create a replica of just that area - it's somewhat unlikely you'll be needing all the 1gb of data that day 🙂 Working with a subset of data should significantly reduce time to download.
The amount of data required in the day is dependent on many things. In a deployment where work can not be broken out into AOI's, access to all the data is required. A blanket statement that this data is not required is inaccurate in some deployments (more often than folks on the development team probably realize ) . When data is associated to regulatory concerns this is generally the norm. Yes 1 GB is larger than would generally be expected and we likely will only have one that large. There are also a number in the 500 MB range.
As I mentioned in my initial post we see a 15 minute to transport when running the application on the same server where the server side replica is created. After the initial deployment we will do this rarely, I just find it curious that this takes as long as it does and wondered if any setting would improve the copy operation. If I run the createReplica from the rest API when I use the return url and download it takes about 40 seconds. So an operation that is taking 40 seconds through a browser appears to be taking 15 minutes using the API.
What tools are you using to replicate your data? I see you mention AGS and services...are you replicating from a geodata service? If so, you could try using the workflow documented here...
It talks about using a map service (with same name as your geodata service and in same AGS folder) to create the replica. Using this method, you have all the data you want replicated in the map service, add the map service to ArcMap and use the Create Replica tool on the Distributed Geodatabase toolbar. Using this toolbar, instead of Create Replica from Server geoprocessing tool, allows you to use the option to 'register existing data only'. That means that before you create the replica you manually copy and paste the data you wish to include in the replica into the child, so that the data is already present in both parent and child geodatabases. When you then run the Create Replica wizard, it simply registers the existing data, rather than copying the data itself.
The replicas are created as offline databases to be synced, not as distributed data. Replicas in this case are created from the Rest API on a feature service. This can also be done in Runtime which basically wraps the Rest calls in the API. What happens when you create a replica in this manner is that AGS creates the file on the server, the Rest API then returns the Url of the replica and it can be downloaded. If I do this though the Rest API myself after receiving the Url in the response I can download the database in about a minute. However, using the Runtime API that minute of data transfer from server to client takes about 15 minutes for the identical database on the same machine. So for some reason it seems that the transport time through the Runtime API is much slower than transferring that exact same data over https in a browser.
I have gone and written a tool that makes the direct Rest API calls to create and then download the replica. The download performance difference is significant. Using a replica that is 166,640 KB.
As I can put logging into the custom tool, it shows ~2:15 to generate the replica on the server (I check every 15 seconds). Based on that that means the API takes 2:29 seconds more to download the replica than just making a direct call using the HttpsClient:GetStreamSync with the result Url. That is 38x longer on a file around 165 MB.
On the network we are deploying the difference is even more significant than seen on our testing environment.
This is using .net for WPF 100.2.