Hello,
I am importing a csv from an AWS S3 bucket into a pandas dataframe. I am using pandas to clean up the values and column names, mainly by converting all strings to upper case. The column names and values in the csv are in lower case, and that is not desired for us. The code below does a nice job of making everything upper case:
df.columns = df.columns.str.upper()
for col in df.select_dtypes(include=['object']).columns:
df[col] = df[col].str.upper()
cleaned_up_values_df = df.set_index('OBJECTID')
However, the problem occurs when I use df.spatial.to_table to export the dataframe to a memory table format compatible with arcpy. All of the column names and values convert back to the original lower case formatting. This is the code I use:
cleaned_up_values_df.spatial.to_table(r'memory/out_table')
Am I missing a parameter with pandas? I'd prefer to not use arcpy to change the column names again since pandas makes it so easy. TIA.
Edit: I should also mention this script will run on a nightly schedule and will need to be able to overwrite the memory table each time.
Solved! Go to Solution.
just checking in practice are you using
sanitize_columns='False'
because that is a string and it should be a Bool so like
sanitize_columns=False
BTW, including the sanitize_column='False' parameter did not fix the problem.
cleaned_up_df.spatial.to_table(location=r'memory/ewt',sanitize_columns='False')
just checking in practice are you using
sanitize_columns='False'
because that is a string and it should be a Bool so like
sanitize_columns=False
This was it, thank you! Of course it was something small, lol. I appreciate your help!
This rebuilds the schema, ignoring modified columns names. cleaned_up_values_df.spatial.to_table(r'memory/out_table')
try something like this, untested.
import arcpy
import numpy as np
# Reset index if needed
df_export = cleaned_up_values_df.reset_index()
# Convert to NumPy structured array with explicit uppercase field names
dtype = [(str(col.upper()), 'U255') for col in df_export.columns] # 'U255' = Unicode string (max 255 chars)
records = np.array([tuple(row) for row in df_export.values], dtype=dtype)
# Export to in-memory table
arcpy.da.NumPyArrayToTable(records, r"memory\out_table")
Thanks for this! This does work to maintain the modified column names, the only caveat is that I cannot overwrite the numpy array when running the script again. This script is going to run on a nightly schedule and I'll need to overwrite the memory table every time. Sorry, I should have mentioned that in my original post.
I would add a check to delete the existing in-memory.
Try this, untested
import arcpy import numpy as np # Reset index if needed df_export = cleaned_up_values_df.reset_index() # Convert to NumPy structured array with explicit uppercase field names dtype = [(str(col.upper()), 'U255') for col in df_export.columns] # 'U255' = Unicode string (max 255 chars) records = np.array([tuple(row) for row in df_export.values], dtype=dtype) # Define the in-memory table path in_memory_table = r"memory\out_table" # Check if the table exists and delete it if it does if arcpy.Exists(in_memory_table): arcpy.management.Delete(in_memory_table) # Export to in-memory table arcpy.da.NumPyArrayToTable(records, in_memory_table) # Verify the table was created if arcpy.Exists(in_memory_table): print("Successfully created in-memory table") else: print("Failed to create in-memory table")