Split comma separated values to multiple rows

Ed_ · ‎03-15-2024

Based on the sample data below I am trying to split each comma separated value in `MUNICIPALI` column to a new as shown in desired output. However, I am getting an error that `Projects_City` cannot open, it does not exist, but then I thought line 6 and onwards will create it. How can I fix this issue? On a side note, the comma separated values are a lot more than the sample data. For, example, one of them rows have 18 or something comma separated values so, will line 10 work for that?

Code:

import arcpy
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
out_df = r"O:\MyProject.gdb\Projects_City_Updated" # Doesn't exist, expect it to be created in line 6
fldlst = ["OBJECTID", "PRJID", "MUNICIPALI"]
with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for attrbPart in row[3].split(","):
                insertCursor.insertRow(( row[0], row[1], row[2], attrbPart))

Error:

Traceback (most recent call last):
  File "o:\path\SplitCommas2Row.py", line 6, in <module>
    with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
RuntimeError: cannot open 'O:\MyProject.gdb\Projects_City_Updated'

Update

After creating the table, please see updated code below, I am getting the following error, how can I fix it?

Code:

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI", "ZIPCODE"]

with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for attrbPart in row[3].split(","):
                insertCursor.insertRow(( row[0], row[1], row[2], attrbPart))

Error:

Traceback (most recent call last):
  File "o:\SplitCommas2Row.py", line 22, in <module>
    for attrbPart in row[3].split(","):
IndexError: tuple index out of range

Sample Data:

OBJECTID	PRJTID	MUNICIPALI	ZIPCODE
1	1	A	1
2	2	A, B	1,2
3	3	A, B, C	1,2,3
4	4	A, B, C, D	1,2,3,4
5	5	A	1
6	6	C	3

Desired output:

OBJECTID	PRJTID	MUNICIPALI	ZIPCODE
1	1	A	1
2	2	A	1
3	2	B	2
4	3	A	1
5	3	B	2
6	3	C	3
7	4	A	1
8	4	B	2
9	4	C	3
10	4	D	4
11	5	A	1
12	6	C	3

Question | Analyze | Visualize

MErikReedAugusta · ‎03-15-2024

Okay, so a few things:

As a general rule, don't nest Cursors in each other. The efficiency generally takes a nosedive.
- I think with an InsertCursor it's less of an issue, but it's still a better practice to separate them. If this had been an UpdateCursor, you would have run through the entire table once for every row on the table. (Meaning for your sample of 6 records, it would've made 36 passes through the table)
- I would first run that SearchCursor and write the necessary data to a dictionary or something similar, and then run the InsertCursor to do your operations.
In line 22, you're assuming your tuple has 4 entries. In line 23, you reference 3. Your field list and your sample data indicate you only have 2 entries to work with in the tuple.
- SearchCursor returns a tuple of the same length as your field list parameter, which lists the values in the same order as those fields they came from.
In your sample data, you have items delimited by comma and space (", "), but in line 22, you assume they're delimited by comma only (","). I'm assuming this is just a formatting thing in how you typed up the post here, but be careful of differences like that. Line 27 of my code below assumes there are spaces after the commas.

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI"]

munis = {}

with arcpy.da.SearchCursor(in_df, fldlst) as sc:
    for prj, muni in sc:
        # NOTE: This assumes the values in PRJTID are unique.  If any repeat
        #  then you'll need to check the munis dictionary to see if you've
        #  seen this particular prjtid before, and append your muni results 
        #  to the existing record, if you have one.
        munis[prj] = muni.split(', ')

with arcpy.da.InsertCursor(out_df, fldlst) as ic:
    for prj in munis:
        for muni in munis[prj]:
            ic.insertRow((prj, muni))

------------------------------
M Reed
"The pessimist may be right oftener than the optimist, but the optimist has more fun, and neither can stop the march of events anyhow." — Robert A. Heinlein, in Time Enough for Love

View solution in original post

JoshuaBixby · ‎03-17-2024

@Ed_, the updated code was close to working, you just need to use Python zip() function and change 2 lines:

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI", "ZIPCODE"]

with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for z in zip(*(map(str.strip, str.split(i, ",")) for i in row[1:])):
                insertCursor.insertRow(row[0:1] + z)

View solution in original post

BobBooth1 · ‎03-15-2024

Try creating the table first and see if it works.

Ed_ · ‎03-15-2024

@BobBooth1 thank you for the quick response, after creating the table, I am getting a tuple error now, please see the updated post, thank you.

Question | Analyze | Visualize

BobBooth1 · ‎03-15-2024

I think you want "row[2]" here:

for attrbPart in row[3].split(","):

MErikReedAugusta · ‎03-15-2024

Okay, so a few things:

As a general rule, don't nest Cursors in each other. The efficiency generally takes a nosedive.
- I think with an InsertCursor it's less of an issue, but it's still a better practice to separate them. If this had been an UpdateCursor, you would have run through the entire table once for every row on the table. (Meaning for your sample of 6 records, it would've made 36 passes through the table)
- I would first run that SearchCursor and write the necessary data to a dictionary or something similar, and then run the InsertCursor to do your operations.
In line 22, you're assuming your tuple has 4 entries. In line 23, you reference 3. Your field list and your sample data indicate you only have 2 entries to work with in the tuple.
- SearchCursor returns a tuple of the same length as your field list parameter, which lists the values in the same order as those fields they came from.
In your sample data, you have items delimited by comma and space (", "), but in line 22, you assume they're delimited by comma only (","). I'm assuming this is just a formatting thing in how you typed up the post here, but be careful of differences like that. Line 27 of my code below assumes there are spaces after the commas.

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI"]

munis = {}

with arcpy.da.SearchCursor(in_df, fldlst) as sc:
    for prj, muni in sc:
        # NOTE: This assumes the values in PRJTID are unique.  If any repeat
        #  then you'll need to check the munis dictionary to see if you've
        #  seen this particular prjtid before, and append your muni results 
        #  to the existing record, if you have one.
        munis[prj] = muni.split(', ')

with arcpy.da.InsertCursor(out_df, fldlst) as ic:
    for prj in munis:
        for muni in munis[prj]:
            ic.insertRow((prj, muni))

------------------------------
M Reed
"The pessimist may be right oftener than the optimist, but the optimist has more fun, and neither can stop the march of events anyhow." — Robert A. Heinlein, in Time Enough for Love

Ed_ · ‎03-15-2024

Worked like charm, thank you so much 😊

Question | Analyze | Visualize

Ed_ · ‎03-15-2024

@MErikReedAugusta can you please look at the modified sample data. It seems like I would also have to split the `ZIPCODE` column as well. How can I split both `MUNICIPALI` and `ZIPCODE` at the same time? Thank you

Question | Analyze | Visualize

MErikReedAugusta · ‎03-15-2024

That depends on how those two fields should interact & combine.

Your sample data seems to be missing some potential combinations for PRJ 2, for example:

PRJ	MUN	ZIP
2	A, B	1, 2

I would expect the following:

PRJ	MUN	ZIP
2	A	1
2	B	1
2	A	2
2	B	2

(Your sample data is missing the second & fourth rows.)

munis = {}
# Search Cursor here; Fields [PRJTID, MUNICPALI, ZIPCODE]:
for prj, mun, zip in cursor:
    munvals = mun.split(', ')
    zipvals = zip.split(', ')

    joinvals = []

    for munval in munvals:
        for zipval in zipvals:
            joinval = f'{munval}|{zipval}'
            joinvals.append(joinval)
            del joinval

    munis[prj] = joinvals
    del joinvals, munvals, zipvals

Alternately,

I notice your sample data also has the same number of items for MUN & ZIP, though. While it's not going to always be true in the real world, if your sample data guarantees a 1:1 match of municipality to ZIP, then you could do the merging that way, before you run the split. If that's the case, you can replace the for loops at Line 9-13 with the snippet below.

Note: Because this assumes a 1:1 match, if you have 5 values in one column and 4 in the other, it'll just ignore that 5th value entirely.

index = 0
while index < len(munvals) and index < len(zipvals):
    joinval = f'{munvals[index]}|{zipvals[index]}'
    joinvals.append(joinval)
    del joinval

------------------------------
M Reed
"The pessimist may be right oftener than the optimist, but the optimist has more fun, and neither can stop the march of events anyhow." — Robert A. Heinlein, in Time Enough for Love

Ed_ · ‎03-15-2024

@MErikReedAugusta Thank you so much 😊 For the new snippet I am getting warning message that `cursor` is not defined. Do I need to define `cursor` earlier in the code?

Question | Analyze | Visualize

Ed_ · ‎03-15-2024

@MErikReedAugusta so I tried adding the cursor using the `with` statement and now I am getting the following error. Also, I don't if indented the `while` loop correctly or not.

while index < len(munvals) and index < len(zipvals):
NameError: name 'munvals' is not defined

Code:

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI", "ZIPCODE"]

munis = {}
# Search Cursor here; Fields [PRJTID, MUNICPALI, ZIPCODE]:
with arcpy.da.SearchCursor(in_df, fldlst) as cursor:
    for prj, mun, zip in cursor:
        munvals = mun.split(',')
        zipvals = zip.split(',')

        joinvals = []

    index = 0
    while index < len(munvals) and index < len(zipvals):
        joinval = f'{munvals[index]}|{zipvals[index]}'
        joinvals.append(joinval)
        del joinval

        munis[prj] = joinvals
        del joinvals, munvals, zipvals

Question | Analyze | Visualize