Writing Unique Attribute Field Values to csv

benberman · ‎05-21-2015

I am trying to do something similar to the URL linked below. The first answer writes out all the values for each attribute field, using the attribute names as the column headers.

python - How to export only certain columns to a CSV file in ArcGIS? - Geographic Information System...

What I want to be able to do is write out the unique values for each attribute field, using the attribute names as the column headers.

BlakeTerhune · ‎05-22-2015

Here's my solution.

def main():
    import arcpy
    import os
    import csv
    import itertools
    import collections

    # Local variables
    workingDir = r"N:\TechTemp\BlakeT\Work"
    workingGDB = os.path.join(workingDir, "TEMP.gdb")
    inputTable = os.path.join(workingGDB, "TestingFc")
    outputCSV = os.path.join(workingDir, "MyOutput.csv")

    # Get list of usable field names (exclude OID and SHAPE fields)
    fieldNames = [
        f.name for f in arcpy.ListFields(inputTable)
        if f.type != "Geometry" and f.editable == True
    ]

    # Populate dictionary with field names and their distinct values
    fieldVals_Distinct = collections.OrderedDict()
    for field in fieldNames:
        print("Processing {}".format(field))
        ## Get distinct field values and populate dictionary
        sqlprefix = "DISTINCT {}".format(field)
        sqlpostfix = "ORDER BY {}".format(field)
        fieldVals_Distinct[field] = [
            row[0] for row in arcpy.da.SearchCursor(
                inputTable,
                [field],
                sql_clause=(sqlprefix, sqlpostfix)
            )
        ]

    # Create CSV
    ## Using the w mode will overwrite an existing file or create a new one
    with open(outputCSV, "w") as csvfile:
        csvwriter = csv.writer(csvfile, delimiter=',', lineterminator='\n')
        ## Write field name header line
        csvwriter.writerow(fieldVals_Distinct.keys())
        ## Zip distinct values together for all fields and write to CSV
        fieldVals_zipped = itertools.izip_longest(
            *fieldVals_Distinct.values(),
            fillvalue=None
        )
        csvwriter.writerows(list(fieldVals_zipped))


if __name__ == '__main__':
    main()

The ordered dictionary may not be necessary, but from my initial testing, it sometimes changed the order of stuff when zipping all of the values together at the end. Thank you to Dan Patterson for the idea to use zip (or zip_longest in this case).

View solution in original post

Zeke · ‎05-21-2015

You may want to look at sets in python. They essentially take a list of items and return a set of unique items in that list. E.g.

>>> myList = [1, 3, 6, 1, 4, 7, 6)
>>> mySet = set(mylist)
>>> mySet
set([1, 3, 6, 4, 7])
>>>

Maybe post some of your existing code and we can see what you need.

benberman · ‎05-21-2015

I am familiar with sets in python. I attempted applying it to the solution posted on SE, but because the code writes by row instead of writing by column, applying set only extracts the unique values on each ROW. I would need it to extract the unique values on each COLUMN.

Zeke · ‎05-22-2015

Dan's way is no doubt better, but since I don't know numpy, what I meant is to read the values of each column into a list (arcpy.da.SearchCursor), then use set on that list. You can then turn the set back into a list if that is more appropriate for your needs. Repeat for each column. This will give you a a collection of sets (or lists) of unique values for each column.

If you're in an environment where you can use SQL queries, you could also just issue a SELECT DISTINCT command. First link on googling 'python sql'.

BlakeTerhune · ‎05-22-2015

You can actually include a DISTINCT statement as a "sql prefix" in the sql_clause parameter of the arcpy.da.SearchCursor(). I've used it before and it works quite well.

sqlprefix = "DISTINCT MyFieldName"
sqlpostfix = "ORDER BY MyFieldName"
myField_Distinct = tuple(
    row[0] for row in arcpy.da.SearchCursor(
        "MyGeodatabaseTableName",
        ["MyFieldName"],
        sql_clause=(sqlprefix, sqlpostfix)
    )
)

You could do that for each field and write it out however you like.

But you did say you didn't want to use arcpy, so... maybe go with Dan's NumPy example.

SepheFox · ‎05-21-2015

So what is different between the answer given to that stack exchange question, and what you are trying to do?

import arcpy fc = 'C:\\antenna_shp\\cables.shp' CSVFile = 'C:\\antenna_shp\\FinalOutput.csv' fields = [f.name for f in arcpy.ListFields(fc)] for i,f in enumerate(fields😞 if f == 'Shape' or f == 'Shape_Length' or f == 'OBJECTID': del fields[i] with open(CSVFile, 'w') as f: f.write(','.join(fields)+'\n') #csv headers with arcpy.da.SearchCursor(fc, fields) as cursor: for row in cursor: f.write(','.join([str(r) for r in row])+'\n')

benberman · ‎05-21-2015

As I have mentioned in my opening post, the SE solution pulls ALL the values in any given attribute field. I am only interested in UNIQUE values.

SepheFox · ‎05-21-2015

Sounds a lot like a summary table and then a pivot table.

benberman · ‎05-21-2015

sure, there are ways to do this through arc tools. I was just interested in more of a pythonic way of doing it.

benberman · ‎05-21-2015

The enumerate block does not even need to be in there, just add a conditional at the end of the fields list, per the snippet below. I also added the "set()" method to the final line. Because the code is writing by row instead of by column, it is extracting the unique values by row. I can't seem to get it to extract unique values by column for some reason.

import arcpy
fc = 'C:\\antenna_shp\\cables.shp'
CSVFile = 'C:\\antenna_shp\\FinalOutput.csv'
fields = [f.name for f in arcpy.ListFields(fc) if f.type != 'Geometry']
with open(CSVFile, 'w') as w:
    w.write(','.join(fields)+'\n') 
    with arcpy.da.SearchCursor(fc, fields) as cursor:
        for row in cursor:
            w.write(','.join(set([str(r) for r in row]))+'\n')