PrintTask is Slow - Best Practices?

NathanaelVaughan · ‎07-02-2014

I've setup a web application primarilly based upon v. 3.5 of the API and Server 10.1 (we attempted an upgrade to SP1, but uninstalled after spending a lot of time on the phone with ESRI support) and have an out-of-the-box print service running on the server.

Now that people are using the application, the print task can be quite slow. With a single service, export times vary from 5 seconds to over a minute. With two services (cached), times can be over a minute.

Print task settings are vanilla, although I did change it to run asynchronously (not that I've seen a performance difference). Watching the CPU usage on the server, it appears as though the services are slow to load (there is a distinct CPU spike when exporting the final image).

The page is hosted privately, otherwise I would share a URL. If you have an inkling as to what could help, send me a message and I'll get you access.

If you have an idea of ways to improve either the print task or my services, I would welcome it. I am contemplating writing my own print task script soon so that I can trouble-shoot more effectively.

JeffPace · ‎07-03-2014

Print, in general, is a very slow operation because it has to be done serverside. I customized my own printing, and it is still quite slow.

NathanaelVaughan · ‎07-03-2014

Print, in general, is a very slow operation because it has to be done serverside. I customized my own printing, and it is still quite slow.

Do you have any thoughts on improving consistency? What I am struggling with is the huge amount of variability in times for the same screen capture.

If I could identify a process, maybe I could prioritize it in windows? I'm toying with the CSS concept mentioned in another post and simply printing using the browser function for an assumed paper size.

There's a 3rd party software called "Geocortex" that has a REST print server that is much more efficient, but I have no idea what is behind it.

JeffPace · ‎07-08-2014

Testing out the new forums. So far - thumbs down.

To your question. The variability is not uncommon, but I am afraid I dont have a cause. Basically the server is rerequesting all the tiles, and cant take advantage of any client side caching.

Sometimes the server caching speeds it up, sometimes not. It all depends on the backend of the engine. And that is very black-boxy.

Remember that the print optimization would have to occur on both the server receiving the request (the application host) and the data arcgis server, not on the client.

NathanaelVaughan · ‎07-11-2014

I've gone "medieval" on the problem in a sense.

I went ahead and wrote my own custom (based on the ESRI example) print service and added logging at each major step. From there, it was apparent that my biggest bottleneck was in the PDF (or whatever other format) export. Times up to that point were on the order of 1 or 2 seconds and I would routinely see times of 20 seconds in the export. I could handle 20 seconds for an 8.5x11 plot.

Upon publishing to the ArcGIS server instance, my plot times went up to roughly 2 minutes. With the log file, I was able to get the JSON export to see if I could isolate the lag to the server. When the script is run from a workstation (and not high powered workstation), performance is much improved.

So I moved on to inspecting the server conifguration - which is something of a black box. Looking at performance monitoring, there are two distinct "spikes" in activity for each print job with a fairly consistent lag between. The first spike appears to be associated with processing the webmap data, while the second is associated with the export process. RAM appears to be spiking (I can't tell if file paging is being invoked, which would be a performance killer). I'm going to try to request some additional RAM on the server and see what that does for performance.

CPU performance on the server peaks around 50% (2 CPUs), but the RAM usage is definitely a concern.

Thanks for the help and tips - hopefully all of this is useful for others. I will update if I can get some more RAM allocated.

EDIT: I should probably have mentioned that there are two ArcGIS server nodes in play here - one is outside of my control and is used for a couple of serivces. Everything else (print service, dynamic and cached map services) are run off of one server. Baseline load is pretty low.

NathanaelVaughan · ‎10-16-2014

It's been awhile, but based upon feedback from others with similar configurations, I opted to abandon the ESRI pdf export function and write something from scratch.

I opted to download individual export images from each service requested, manually apply transparency to each individual image and then overlay them into a single, composite image (this was necessary as the PDF library I used did not support real image transparency).

With that image, I created a PDF canvas using ReportLabs and decorated the remainder of the page with elements drawn via code, such as a scale bar and over view map. Ultimately, I modified the routine to read an MXD print layout similar to the original ESRI function. Most of the formatting options available for objects in the print layout are not exposed via the python API, so I had to add some functions to parse object names to look for formatting tags which the user must insert. Since the code uses the ESRI export function for image exporting, this at least allowed for a single MXD to be used to format both output types.

I also added a GDAL/OGR function to clip any polygon graphics to the view extent. There are other kinds of graphics out there, but not currently in the client webviewer. Without this function, some graphics would be drawn outside the extents of the map window.

The final element added was a download time limit in the image download routine. One server in particular tends to "hang up" and was slowing down the pdf generation due to long export generation/transfer times. I addressed this by reducing the sampling density (dpi) for the image export and adding a timeout function in the parallel downloads using Greenlet/eventlet. When nothing is returned by the end of the timeout period, the script continues without the delayed image.

PDF generation times on the server went from 1.5-2 minutes to about 20-30 seconds. The script is around 900 lines, so I don't recommend undertaking something like this lightly (and my implementation is specific to the target application).