replace commas from .csv

437
6
04-29-2021 12:11 PM
JustinWolff
Occasional Contributor

I have a .csv that was created using input_xls.to_csv, and everything delimits correctly.  However, within some of the string records are commas (e.g. a business name like 'Widgets, Inc.').  The result is incorrect delimiting when importing the .csv into a file geodatabase using arcpy.conversion.TableToTable -  values end up shifted over into other columns.  I want to replace the commas with a space.

I cannot figure out how to use either something like

.apply(lambda x: x.replace(',', ' '))

or

 [val if val else " " for val in row] + ([" "] * (len(first_row) - len(row))) (found here: https://community.esri.com/t5/python-questions/find-and-replace-blanks-in-csv-with-python/m-p/673632

Any help to nudge me in the right direction would be appreciated.  Thanks.

Tags (2)
0 Kudos
6 Replies
mattkramer
New Contributor III

What you can do is read your excel file into an array, where when appending data into the array, you call the replace command to get rid of the comma.

 

inRow = [str(row[0]),str(row[0]),str(row[2]),str(row[3]),str(row[4]),str(row[5]),str(row[6]),str(row[7]),str(row[8]),str(row[9]),str(row[10])]
index = 0
for in inRow:
     if "," in i:
          i = i.replace(",","")
          inRow[index] = i
     index += 1
points.append(inRow)
And then use that array to create the csv.
 
        AllProjects_CSV.truncate()
        writer = csv.writer(AllProjects_CSV,delimiter=',')
        writer.writerow(fields)
        writer.writerows(points)
My example is pulling data from a SDE database but I am pretty sure you can use something like openpyxl to read through an excel sheet.
0 Kudos
DanPatterson
MVP Esteemed Contributor

You might want to try bringing in the csv back into Excel, then  try the Excel to Table tool to see if it handles the embedded commas better when bringing back into a file geodatabase


... sort of retired...
0 Kudos
JoshuaBixby
MVP Esteemed Contributor

What is your input_xls?  Is it a pandas DataFrame?

0 Kudos
JustinWolff
Occasional Contributor

Yes, I'm using pandas .read_excel and .to_csv

I was out Friday and will come back to this later today, but plan to look further at what Matt Kramer posted above.  Thanks

0 Kudos
Tomasz_Tarchalski
New Contributor III

Personally I cannot recreate your probleme, but you have two options.

1. change regional settings of your machine to use different seperators (semicolon for example) 

2. in the same folder as your data add file "schema.ini", where you  specify how data is seperated, additionnaly you can change the name of columns and its types. 

Tomasz_Tarchalski_2-1619767704743.png

 

Tomasz_Tarchalski_1-1619767661403.png

 

 

0 Kudos
JustinWolff
Occasional Contributor

I never followed-up with the solution I found using csv and pandas:

    with open(cama_csv_file, 'r'as f1:
        with open(cama_csv_file_final, 'w+'as f2:
            df = pd.read_csv(f1)
            df.replace(',',' ', regex=True, inplace=True)
            df.to_csv(f2, index=False, line_terminator='\r')
    f2.close()
    f1.close()

 

Creating a new .csv is not an issue, because I delete all of them at the end of the process.

0 Kudos