How to improve poor performance of Table to Excel on large datasets?

483
7
Jump to solution
03-10-2022 12:40 PM
Labels (1)
aaronkrusniak
New Contributor II

Hello,

I'm working on an analysis with a large point dataset, about 10.03 million records.  The process itself is very simple: I run a spatial join on the points, and then need to export them for further analysis (via the 'Table to Excel' tool).  The issue is that Table to Excel is having abysmal performance with this much data— clocking in at just over 3.5 hours to output the final .xlsx file.

I would expect any tool that runs over this much data to take some time, but this much feels wrong (and moreover, is untenable for my work).  To give some perspective, running the spatial join over the full set of these ~10 million points only takes about 45 minutes.  So, this brings me to three questions:

1. Are large data writing operations known to be slow by other users?  And if so, is there anything I can/should be doing to speed up the processing time (short of increasing my hardware specs, which are already of decent quality)?

2. If there's nothing I can do inside of ArcGIS Pro, what might some good alternatives be?  Since the rest of my analysis will be done using an R script, I'm considering trying to use the R-ArcGIS bridge and/or geospatial packages like 'sf' to do my spatial join, saving me the pain of exporting from Arc and importing into R— but I've never tried this before.  Any suggestions?

3. From a software perspective, I'm curious what might be causing such slow speeds for what should be (I think) a relatively trivial data writing operation.  I know from experience that similar operations might take only a few minutes with something like R/tidyverse or Python/pandas.  Would the poor performance stem from extra processing being necessary to translate from a GDB environment to Excel?  Inefficient implementation of the write tool under the hood?  Lousy structure of my underlying data?  Something else?  I imagine this is a somewhat common operation among users, so it would be neat to find ways to increase performance.  Maybe it would be faster to write to a .csv than to a .xlsx spec?  I don't know if this is currently possible in ArcGIS Pro, though.

Thanks in advance for any help!

Cheers,

Aaron

0 Kudos
1 Solution

Accepted Solutions
DanPatterson
MVP Esteemed Contributor

Table To Table (Conversion)—ArcGIS Pro | Documentation

handles output to csv, that would remove the necessary formatting to excel steps.

BUT if you have null values in the table, you had better replace them with an appropriate null value for the data type (eg -9999 for integers, "No_data_here" for text, etc)


... sort of retired...

View solution in original post

7 Replies
DanPatterson
MVP Esteemed Contributor

Table To Table (Conversion)—ArcGIS Pro | Documentation

handles output to csv, that would remove the necessary formatting to excel steps.

BUT if you have null values in the table, you had better replace them with an appropriate null value for the data type (eg -9999 for integers, "No_data_here" for text, etc)


... sort of retired...
Robert_LeClair
Esri Frequent Contributor

Aaron - searched the internals on what could cause the slowness.  One incident was the user had taken a license offline.  The customer compared the GP speed with online and offline usage and there was a time difference apparently.  Not sure if this is your case or not.  One thing is to use Diagnostic Monitor for ArcGIS Pro to see what processes are slow and use that information to troubleshoot the issue.

RhettZufelt
MVP Frequent Contributor

Could it have something to do with trying to insert nearly 10 times the amount of data that an Excel spreadsheet can hold?

Total number of rows and columns on a worksheet

1,048,576 rows by 16,384 columns

 

R_

AbhishekSubedi
New Contributor II

Hi. I am actually encountering this excel row limit. I have a table with 5 million rows. How can I export it to 5 excel sheets with 1 million rows each? I am a beginner in ARCGIS Pro and any help would be greatly appreciated. Thank you.

0 Kudos
AbhishekSubedi
New Contributor II

Hi. I have a table with 5 million rows. How can I export it to 5 excel sheets with 1 million rows each? I am a beginner in ARCGIS Pro and any help would be greatly appreciated. Thank you.

0 Kudos
aaronkrusniak
New Contributor II

Hey Abhishek,

I think Dan has the best solution here, which is to use the Table to Table tool to output your table as a .csv file.  I've found that it's generally easier and faster than trying to export as .xlsx.  An easy shortcut for this is to right click on your layer or standalone table > Data > Export Table.  Set the output location as desired, and then to set the output as a .csv, you'll have to name the file with .csv at the end (e.g. set Output Name to something like my_table.csv).

Rhett also has a great point (that I totally forgot to consider when I was running into this problem!) that more than ~1 million rows is too much for Excel, so your .csv will still be too big to open in Excel.  If you wanted to make 5 different, smaller .csv files, one option might be to use the SQL selection on the Export Table/Table to Table tool to select rows with ID of 1-1 million, 1 million - 2 million, etc.  Maybe not the most elegant but I think it'd work.

AbhishekSubedi
New Contributor II

You're a lifesaver. Thanks!

0 Kudos