Split comma separated values to multiple rows

Ed_ · ‎03-15-2024

Based on the sample data below I am trying to split each comma separated value in `MUNICIPALI` column to a new as shown in desired output. However, I am getting an error that `Projects_City` cannot open, it does not exist, but then I thought line 6 and onwards will create it. How can I fix this issue? On a side note, the comma separated values are a lot more than the sample data. For, example, one of them rows have 18 or something comma separated values so, will line 10 work for that?

Code:

import arcpy
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
out_df = r"O:\MyProject.gdb\Projects_City_Updated" # Doesn't exist, expect it to be created in line 6
fldlst = ["OBJECTID", "PRJID", "MUNICIPALI"]
with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for attrbPart in row[3].split(","):
                insertCursor.insertRow(( row[0], row[1], row[2], attrbPart))

Error:

Traceback (most recent call last):
  File "o:\path\SplitCommas2Row.py", line 6, in <module>
    with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
RuntimeError: cannot open 'O:\MyProject.gdb\Projects_City_Updated'

Update

After creating the table, please see updated code below, I am getting the following error, how can I fix it?

Code:

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI", "ZIPCODE"]

with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for attrbPart in row[3].split(","):
                insertCursor.insertRow(( row[0], row[1], row[2], attrbPart))

Error:

Traceback (most recent call last):
  File "o:\SplitCommas2Row.py", line 22, in <module>
    for attrbPart in row[3].split(","):
IndexError: tuple index out of range

Sample Data:

OBJECTID	PRJTID	MUNICIPALI	ZIPCODE
1	1	A	1
2	2	A, B	1,2
3	3	A, B, C	1,2,3
4	4	A, B, C, D	1,2,3,4
5	5	A	1
6	6	C	3

Desired output:

OBJECTID	PRJTID	MUNICIPALI	ZIPCODE
1	1	A	1
2	2	A	1
3	2	B	2
4	3	A	1
5	3	B	2
6	3	C	3
7	4	A	1
8	4	B	2
9	4	C	3
10	4	D	4
11	5	A	1
12	6	C	3

Question | Analyze | Visualize

Ed_ · ‎03-19-2024

Hi @JoshuaBixby hope all is well, so the code works fine however there's a slight issue and that's there are municipalities in the data sets that are made of two or more words so they have space(s) in their names. And so the code for some reason removes the spaces from the names. How can I fix that? Moreover, there are some municipalities whose names are composed of 4 words such as `Lauderdale By The Sea`

Input data sample row:

Output (from script) data sample row:

Question | Analyze | Visualize

JoshuaBixby · ‎03-19-2024

I updated the code in my original post, try it out and let me know.

Bud · ‎06-12-2024

Possibly related:

Split comma separated values to multiple rows

Bud · ‎06-12-2024

Another option is to use Excel Power Query in the spreadsheet to pre-process the data. The Power Query query can be refreshed in Excel with the click of a button.