Writing Unique Attribute Field Values to csv

benberman · ‎05-21-2015

I am trying to do something similar to the URL linked below. The first answer writes out all the values for each attribute field, using the attribute names as the column headers.

python - How to export only certain columns to a CSV file in ArcGIS? - Geographic Information System...

What I want to be able to do is write out the unique values for each attribute field, using the attribute names as the column headers.

RebeccaStrauch__GISP · ‎05-21-2015

I'm working on something similar. I'm writing the items that I want to a variable. I then check to see it it's unique, and append it to my list if not. the I sort the list. then I write it to my .csv. In short...

outFile = r"d:/_dataTest/myoutput.csv"  

f = open(outFile, "w") 

print ("File is open:  " + str(not f.closed))

f.write("Type, TOCRef, FullPath, newPathn") 

uniqueList = []

for root, dirs, files in os.walk(rootDir): 
    if something: #do stuff and create various anItem's
        anItem = ("blah", "blah", "blah") # creates an item with the fields or variable values I want

    if anItem not in uniqueList:    # created tuple list of uniq combinations
        uniqueList.append(anItem)  #appends the item only if it is completely unique

#once all is done and everything is appended and unique
uniqueList.sort()  # sorts the list, order Type, Path, TOC

for item in uniqueList:
    # creating output as Type, TOC, Path

    f.write("{},{}, {}\n".format(item[0], item[2], item[1]))
f.close()

I trimmed this from a much larger script so obviously it will not run as is. Hope this helps.

DanPatterson_Retired · ‎05-21-2015

Numpy has tools to find unique values in files. This example gives you a sniff at what you can do. When working with tabular data, use numpy, it is way faster and there is more that can be done...and yes...it even works with columns.

The example below simply finds the unique values in two columns, one integer, the other string....

import numpy as np
np.set_printoptions(edgeitems=4,linewidth=60,precision=2,suppress=True,threshold=10)
X = [6.0,7,8,3,6,2,9,2,8,3]
Y = [0.0,9,6,2,1,5,4,1,4,5]
Group = [4,2,1,4,3,2,2,3,4,1]
Text = ['a','c','b','a','a','b','a','c','d','b']
dt = [('X','f8'),('Y','f8'),('Group','i4'),('Text','|S5')]
arr = np.array(zip(X,Y,Group,Text),dtype=dt)
print('\nInput array: X,Y,Group,Text,\n{}'.format(arr))
for col in ['Group','Text']:
    arr_where,idx= np.unique(arr[col],return_index=True)
    print('Unique results==> Column: {}\n  Unique: {}\n  Indices: {}'.format(col,arr_where,idx))

give it whirl

BlakeTerhune · ‎05-21-2015

Is this what you're looking for? It might help to see an example.

Input table:

PrimaryKey	SomeAttr	OtherSomething	NewVal
123	ABC	LOW	6.123
456	ABC	MED	15.456
789	ABC	HIGH	24.789
321	CBA	LOW	6.321
654	CBA	MED	15.654

Output CSV:

PrimaryKey	SomeAttr	OtherSomething	NewVal
123, 456, 789, 321, 654	ABC, CBA	LOW, MED, HIGH	6.123, 6.321, 15.456, 15.654, 24.789

DanPatterson_Retired · ‎05-21-2015

using my example above, just add the code

Groups = (np.unique(arr['Group'])).tolist()
Classes =  (np.unique(arr['Text'])).tolist()
print('Unique groups: {}\nUnique text: {}'.format(Groups,Classes))

Gives

Unique groups: [1, 2, 3, 4]

Unique text: ['a', 'b', 'c', 'd']

which you could obviously add to a field or dictionary or whatever

benberman · ‎05-22-2015

Correct, except I'm trying to write it out each unique value in sucessive rows, one value per cell, instead of all on the same row(multiple values in the same cell).

BlakeTerhune · ‎05-22-2015

Here's my solution.

def main():
    import arcpy
    import os
    import csv
    import itertools
    import collections

    # Local variables
    workingDir = r"N:\TechTemp\BlakeT\Work"
    workingGDB = os.path.join(workingDir, "TEMP.gdb")
    inputTable = os.path.join(workingGDB, "TestingFc")
    outputCSV = os.path.join(workingDir, "MyOutput.csv")

    # Get list of usable field names (exclude OID and SHAPE fields)
    fieldNames = [
        f.name for f in arcpy.ListFields(inputTable)
        if f.type != "Geometry" and f.editable == True
    ]

    # Populate dictionary with field names and their distinct values
    fieldVals_Distinct = collections.OrderedDict()
    for field in fieldNames:
        print("Processing {}".format(field))
        ## Get distinct field values and populate dictionary
        sqlprefix = "DISTINCT {}".format(field)
        sqlpostfix = "ORDER BY {}".format(field)
        fieldVals_Distinct[field] = [
            row[0] for row in arcpy.da.SearchCursor(
                inputTable,
                [field],
                sql_clause=(sqlprefix, sqlpostfix)
            )
        ]

    # Create CSV
    ## Using the w mode will overwrite an existing file or create a new one
    with open(outputCSV, "w") as csvfile:
        csvwriter = csv.writer(csvfile, delimiter=',', lineterminator='\n')
        ## Write field name header line
        csvwriter.writerow(fieldVals_Distinct.keys())
        ## Zip distinct values together for all fields and write to CSV
        fieldVals_zipped = itertools.izip_longest(
            *fieldVals_Distinct.values(),
            fillvalue=None
        )
        csvwriter.writerows(list(fieldVals_zipped))


if __name__ == '__main__':
    main()

The ordered dictionary may not be necessary, but from my initial testing, it sometimes changed the order of stuff when zipping all of the values together at the end. Thank you to Dan Patterson for the idea to use zip (or zip_longest in this case).

DanPatterson_Retired · ‎05-22-2015

Blake

Very nice job...!!!

I will put forth this little ditty in the hopes that you can see that a search cursor can be completely replaced if desired and everything brought into numpy by using TableToNumPyArray (and its reverse equivalent).

The example produces 3 columns of data... ID,Single,Double with ID being...well...an ID field and the last 2 fields just random letters. I find it easier to use data that I can control for testing. I do like the use of the OrderedDict and Counter. Anyhow, have some fun with this. I will post a follow up on my blog that incorporates time testing, but I thought this would amuse you and others in the interim.

Dan

import numpy as np
import string
import random
from collections import Counter, OrderedDict
np.set_printoptions(edgeitems=4,linewidth=60,precision=2,suppress=True,threshold=10)
# testing stuff...................................................
classes = string.uppercase[:5]  # first 5 letters of the alphabet
N = 100
col1 = [random.choice(classes) for i in range(N)]
col2 = [random.choice(classes)*2 for i in range(N)]
ID = np.arange(N)
data = zip(ID,col1,col2)
arr = np.array(data,dtype=[("ID","int32"),("Single","|S5"),("Double","|S5")])
#.................................................................
# real stuff ... see
#  arr = TableToNumPyArray(in_table,field_names,{where_clause},
#                          {skip_nulls},{null_value})
#
print("\nInput array:\n{}\ndtype: {}".format(arr,arr.dtype))
fields = arr.dtype.names  # array field names
for f in fields:
    if arr.dtype.char == 'S':   # could include other field types@
        c = Counter(arr)
        keys = c.keys()
        od = OrderedDict((x,c) for x in keys)
        print("\nField: {}\n  Counter: {}\n  OrderedDict: {}".format(f,c,od))
    else:
        print("\nField: {}, type: {} not string type... darn ...".format(f,arr.dtype))

benberman · ‎05-22-2015

Thank you, Blake and Dan! This helps a lot!

BlakeTerhune · ‎05-22-2015

I can see the utility of NumPy, but I just haven't learned it yet. I also did not know about Counter. Every little bit helps in learning these new things. Thank you for posting the follow-up Dan Patterson!

DanPatterson_Retired · ‎05-22-2015

Blake...there is some older stuff here from Aug 2014...I am currently updating stuff, so it isn't uptodate

Collections in numpy: producing frequency distributions and graphing