Select to view content in your preferred language

Split comma separated values to multiple rows

1878
23
Jump to solution
03-15-2024 05:59 AM
Ed_
by MVP Regular Contributor
MVP Regular Contributor

Based on the sample data below I am trying to split each comma separated value in `MUNICIPALI` column to a new as shown in desired output. However, I am getting an error that `Projects_City` cannot open, it does not exist, but then I thought line 6 and onwards will create it. How can I fix this issue? On a side note, the comma separated values are a lot more than the sample data. For, example, one of them rows have 18 or something comma separated values so, will line 10 work for that? 

Code:

 

 

 

 

import arcpy
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
out_df = r"O:\MyProject.gdb\Projects_City_Updated" # Doesn't exist, expect it to be created in line 6
fldlst = ["OBJECTID", "PRJID", "MUNICIPALI"]
with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for attrbPart in row[3].split(","):
                insertCursor.insertRow(( row[0], row[1], row[2], attrbPart))

 

 

 

 

 

Error:

 

 

 

 

Traceback (most recent call last):
  File "o:\path\SplitCommas2Row.py", line 6, in <module>
    with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
RuntimeError: cannot open 'O:\MyProject.gdb\Projects_City_Updated'

 

 

 

 

 

Update

After creating the table, please see updated code below, I am getting the following error, how can I fix it?

 

Code:

 

 

 

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI", "ZIPCODE"]

with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for attrbPart in row[3].split(","):
                insertCursor.insertRow(( row[0], row[1], row[2], attrbPart))

 

 

 

 

Error:

 

 

 

Traceback (most recent call last):
  File "o:\SplitCommas2Row.py", line 22, in <module>
    for attrbPart in row[3].split(","):
IndexError: tuple index out of range

 

 

 

Sample Data:

OBJECTIDPRJTIDMUNICIPALIZIPCODE
11A1
22A, B1,2
33A, B, C1,2,3
44A, B, C, D1,2,3,4
55A1
66C3

 

Desired output:

OBJECTIDPRJTIDMUNICIPALIZIPCODE
11A1
22A1
32B2
43A1
53B2
63C3
74A1
84B2
94C3
104D4
115A1
126C3
Question | Analyze | Visualize
Tags (1)
0 Kudos
23 Replies
MErikReedAugusta
Frequent Contributor

Yeah, looks like a few of the nested blocks didn't nest properly.

with ...
    for ...
        [Line 23-28]
        while ...
            [Line 30-32]
        munis[prj] ...
        del joinvals, ...

 

Re: "cursor", looks like I threw you a curve ball by changing what I called the cursor from one script to another.  I was typing these off the top of my head earlier.  Your most recent code seems to call it correctly, so it looks like you figured it out.

I'm going to assume you're getting that error on your second time through that While statement.  Essentially, the way you have it indented, it uses the for loop to go through the entire database, and just keeps overwriting itself.  That means that it gets to the while loop on the last record in your database.

Then, it runs the while loop once, and logs the values from that last record.  But then line 35 deletes the variables that the while loop is looking for.  You ripped a stone out from the bottom of the wall, so it all came crashing down.

Fixing the indentation should fix the problem.

 

With Python, try to think of things in logical blocks, like an outline you might write before starting a research paper.  The indentation is how you join blocks together.  Lines that start at the same indentation are part of the same "block".  If you scroll up through your code from the bottom, the first line you come to that's less indented is the start of that block.

So instead of the while block existing inside the for block, the way you have it indented makes it exist beside the for loop, which breaks the logic of the code.

 

Ed_
by MVP Regular Contributor
MVP Regular Contributor

Good morning @MErikReedAugusta thank you so much for elaborate guidance on this 😊

Question | Analyze | Visualize
0 Kudos
Bud
by
Esteemed Contributor

What kind of geodatabase?

If it's an enterprise or mobile geodatabase, it could possibly be done with SQL in a query layer or database view. Maybe using a recursive WITH clause or a database-specific mechanism.

0 Kudos
Ed_
by MVP Regular Contributor
MVP Regular Contributor

Hi Bud, thank you for the suggestion, I prefer Python since I will be automating the script and what I am doing right now is just the beginning of the script. 

Question | Analyze | Visualize
0 Kudos
Bud
by
Esteemed Contributor

Even still, using a permanent database view in your Python script could simplify your script. 

Bud
by
Esteemed Contributor

Is the number of comma-separated values in MUNICIPALITY and ZIPCODE always the same?

For example, would MUNICIPALITY ever have 1 value, but ZIPCODE has 2 values?

0 Kudos
Ed_
by MVP Regular Contributor
MVP Regular Contributor
Always the same
Question | Analyze | Visualize
0 Kudos
JoshuaBixby
MVP Esteemed Contributor

@Ed_, the updated code was close to working, you just need to use Python zip() function and change 2 lines:

 

import arcpy

# To allow overwriting outputs change overwriteOutput option to True.
arcpy.env.overwriteOutput = True
# Import the surtax projects geodatabase table subset
in_df = r"O:\MyProject.gdb\Projects_City"
arcpy.management.CreateTable(
    out_path=r"O:\MyProject.gdb",
    out_name="Projects_City_Updated",
    template=in_df,
    config_keyword="",
    out_alias=""
)

out_df = r"O:\MyProject.gdb\Projects_City_Updated"

fldlst = ["PRJTID", "MUNICIPALI", "ZIPCODE"]

with arcpy.da.InsertCursor(out_df, fldlst) as insertCursor:
    with arcpy.da.SearchCursor(in_df, fldlst) as searchCursor:
        for row in searchCursor:
            for z in zip(*(map(str.strip, str.split(i, ",")) for i in row[1:])):
                insertCursor.insertRow(row[0:1] + z)

 

Ed_
by MVP Regular Contributor
MVP Regular Contributor

Good morning @JoshuaBixby worked like a charm thank you, the zip function makes things simpler.

Question | Analyze | Visualize
0 Kudos