Select to view content in your preferred language

arcpy.conversion.ExcelToTable deletes existing table rather than overwriting it

376
4
04-01-2024 02:09 PM
twhammond
New Contributor II

Hello all,

I have a fairly large project going on (at least by my standards). I'll try to summarize it very briefly so my issue has some context.

I have a script that does the following:

  • refreshes an excel file to get current data
  • creates a pandas dataframe from that excel file
  • manipulates the pandas dataframe
  • turns it back into an excel file
  • converts that final excel file into a table using arcpy.conversion.ExcelToTable
    • The table already exists in the geodatabase and is intended to be overwritten with new data.
  • afterwards it creates some relationship classes, but the problem seems to lie in the step above

This script works fine and I have a working copy that does all of the above.  However, when I refactored in order to add timeout functionality (using the python multiprocessing library), ExcelToTable stopped working for me.  I essentially gave the entire original script to a child process that would terminate if it failed to complete in a certain period of time. 

I have isolated the issue to the ExcelToTable step. 

Here are some screenshots:

After line 452 runs, the excel file is set up correctly with all the data from the dataframe.

twhammond_1-1712004447115.png

Below is a screenshot of what is inside my geodatabase all the way until I run line 453 (the ExcelToTable line).  Keep your eye on "WRCPTTable".

twhammond_0-1712003883323.png

After running that line,  here is my geodatabase:

twhammond_2-1712004907043.png

 

Anything related to the WRCPTTable is deleted.  Does anyone have any idea why this could be happening?  It does not throw an error until I try to create a relationship class and then I get

"arcgisscripting.ExecuteError: Failed to execute. Parameters are not valid.
ERROR 000732: Destination Table: Dataset C:\Users\tommy\Documents\parcels_2024.gdb\\\WRCPTTable does not exist or is not supported
Failed to execute (CreateRelationshipClass)."

which makes sense as the WRCPTTable no longer exists by that point.

I am at a loss.  I have set arcpy.env.overwriteOutput = True, the path to my gdb is correct and still works using the initial version of my script.

Any ideas would be much appreciated.  If any other details are needed, please let me know.

Thank you

 

 

0 Kudos
4 Replies
MErikReedAugusta
Occasional Contributor III

Without seeing the rest of the code, it could be hard to say what interaction is actually causing this.  The proximate cause is clearly ExcelToTable, based on your testing, but that doesn't mean that the ultimate cause is that function.

Though if this is 450+ lines of code, posting the full code may bury us in too much data and obfuscate the problem, anyway.

So instead, some general thoughts to chase down first:

  1. I know there's a Geoprocessing Environment setting that either allows or disallows overwriting an existing dataset.  Might also be something in the program settings along the same lines?  I'm 99% sure I have both set to disallow, for simplicity's sake.  You should probably check which setting you have and consider if it's the right one for you in this case.
  2. Personally, if I know I'm going to be overwriting something, I'll generally check for it, delete it, and then create it again.  I'm just paranoid like that, and it generally works out.
if arcpy.Exists(tablePath):
  arcpy.management.Delete(tablePath)

arcpy.conversion.ExcelToTable(excelFile, tablePath)

 

0 Kudos
twhammond
New Contributor II

I appreciate the response.  The detection and deletion work when the table is in the geodatabase (and it is a simple enough process that I may adopt this method myself moving forward).  Unfortunately, even after the table's deletion, ExcelToTable fails to create the table.

This leads me to believe that this is not an overwriting issue, but maybe a problem with the way the ExcelToTable conversion tool works when called by the subprocess I created.

Approaching the question is quite difficult as this code is sitting around 1100 lines now.  I tried to walk the line between creating a short enough post that people will understand and willingly read, and adding enough detail to prompt a guess as to what is causing this problem.  I will see what additional details and code snippets I can add without drowning anyone in the 1100 line mess I have made.

Again, I appreciate your guess.

0 Kudos
DonMorrison1
Occasional Contributor III

I would first pare the subprocess code down to a single line if possible (ExcelToTable) and try to convert a trivial .xls file to see if that works as expected.  If it fails then you are much more likely to get help from this forum or ESRI.  If it doesn't fail then incrementally add in the complexities of your original script until it fails. 

0 Kudos
twhammond
New Contributor II

Thanks everyone for your input.  I thought it would be a good idea to post an update.  I never ended up getting around to whittling this problem down to it's core to present here.  I never got around to directly solving the problem either.  I used a debugger to walk line by line through my script and all of the arcpy source code that was getting referenced when ExcelToTable was called, but after a couple days of that I decided to just take the easy way out and find a workaround.  

The workflow before:

mapProcess1.png

The workflow after:

mapProcess2.png

 

Essentially what I did was exclude the problem child, which was my ExcelToTable conversion, and just had that command run by the main process.  It only caused issues when called by a child process, so this ended up working.  I didn't get the satisfaction of directly solving the problem or getting timeout functionality for the arcpy commands, but the only portions I truly needed a timeout for were the excel refresh and some arcGIS Online operations (aka 'etc.').  It is still a mystery to me why the child process could not call ExcelToTable. Maybe I'll get around to doing more in-depth testing on that someday.  For now, this suits my purposes just fine.

*Just for clarification, these timeouts do not actually wait five minutes and then join the process. They actually attempt to join the process right away, and if that takes longer than five minutes it raises an error for me to handle (by forcing the child process to terminate, and then joining again for the program to proceed).

0 Kudos