Split comma separated values to multiple rows

Ed_ · ‎03-15-2024

Based on the sample data below I am trying to split each comma separated value in `MUNICIPALI` column to a new as shown in desired output. However, I am getting an error that `Projects_City` cannot open, it does not exist, but then I thought line 6 and onwards will create it. How can I fix this issue? On a side note, the comma separated values are a lot more than the sample data. For, example, one of them rows have 18 or something comma separated values so, will line 10 work for that?

Code:

import arcpy
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
out_df = r"O:\MyProject.gdb\Projects_City_Updated" # Doesn't exist, expect it to be created in line 6
fldlst = ["OBJECTID", "PRJID", "MUNICIPALI"]
with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for attrbPart in row[3].split(","):
                insertCursor.insertRow(( row[0], row[1], row[2], attrbPart))

Error:

Traceback (most recent call last):
  File "o:\path\SplitCommas2Row.py", line 6, in <module>
    with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
RuntimeError: cannot open 'O:\MyProject.gdb\Projects_City_Updated'

Update

After creating the table, please see updated code below, I am getting the following error, how can I fix it?

Code:

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI", "ZIPCODE"]

with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for attrbPart in row[3].split(","):
                insertCursor.insertRow(( row[0], row[1], row[2], attrbPart))

Error:

Traceback (most recent call last):
  File "o:\SplitCommas2Row.py", line 22, in <module>
    for attrbPart in row[3].split(","):
IndexError: tuple index out of range

Sample Data:

OBJECTID	PRJTID	MUNICIPALI	ZIPCODE
1	1	A	1
2	2	A, B	1,2
3	3	A, B, C	1,2,3
4	4	A, B, C, D	1,2,3,4
5	5	A	1
6	6	C	3

Desired output:

OBJECTID	PRJTID	MUNICIPALI	ZIPCODE
1	1	A	1
2	2	A	1
3	2	B	2
4	3	A	1
5	3	B	2
6	3	C	3
7	4	A	1
8	4	B	2
9	4	C	3
10	4	D	4
11	5	A	1
12	6	C	3

Question | Analyze | Visualize

MErikReedAugusta · ‎03-15-2024

Yeah, looks like a few of the nested blocks didn't nest properly.

with ...
    for ...
        [Line 23-28]
        while ...
            [Line 30-32]
        munis[prj] ...
        del joinvals, ...

Re: "cursor", looks like I threw you a curve ball by changing what I called the cursor from one script to another. I was typing these off the top of my head earlier. Your most recent code seems to call it correctly, so it looks like you figured it out.

I'm going to assume you're getting that error on your second time through that While statement. Essentially, the way you have it indented, it uses the for loop to go through the entire database, and just keeps overwriting itself. That means that it gets to the while loop on the last record in your database.

Then, it runs the while loop once, and logs the values from that last record. But then line 35 deletes the variables that the while loop is looking for. You ripped a stone out from the bottom of the wall, so it all came crashing down.

Fixing the indentation should fix the problem.

With Python, try to think of things in logical blocks, like an outline you might write before starting a research paper. The indentation is how you join blocks together. Lines that start at the same indentation are part of the same "block". If you scroll up through your code from the bottom, the first line you come to that's less indented is the start of that block.

So instead of the while block existing inside the for block, the way you have it indented makes it exist beside the for loop, which breaks the logic of the code.

------------------------------
M Reed
"The pessimist may be right oftener than the optimist, but the optimist has more fun, and neither can stop the march of events anyhow." — Lazarus Long, in Time Enough for Love, by Robert A. Heinlein

Ed_ · ‎03-18-2024

Good morning @MErikReedAugusta thank you so much for elaborate guidance on this 😊

Question | Analyze | Visualize

Bud · ‎03-15-2024

Related: Arcade Split comma delimited value into multiple rows, keeping row details

Idea: GP tool to split field with comma-delimited values into multiple rows

Bud · ‎03-15-2024

What kind of geodatabase?

If it's an enterprise or mobile geodatabase, it could possibly be done with SQL in a query layer or database view. Maybe using a recursive WITH clause or a database-specific mechanism.

Ed_ · ‎03-15-2024

Hi Bud, thank you for the suggestion, I prefer Python since I will be automating the script and what I am doing right now is just the beginning of the script.

Question | Analyze | Visualize

Bud · ‎03-15-2024

Even still, using a permanent database view in your Python script could simplify your script.

Bud · ‎03-16-2024

Is the number of comma-separated values in MUNICIPALITY and ZIPCODE always the same?

For example, would MUNICIPALITY ever have 1 value, but ZIPCODE has 2 values?

Ed_ · ‎03-16-2024

Always the same

Question | Analyze | Visualize

JoshuaBixby · ‎03-17-2024

@Ed_, the updated code was close to working, you just need to use Python zip() function and change 2 lines:

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI", "ZIPCODE"]

with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for z in zip(*(map(str.strip, str.split(i, ",")) for i in row[1:])):
                insertCursor.insertRow(row[0:1] + z)

Ed_ · ‎03-18-2024

Good morning @JoshuaBixby worked like a charm thank you, the zip function makes things simpler.

Question | Analyze | Visualize