I am trying to do something similar to the URL linked below. The first answer writes out all the values for each attribute field, using the attribute names as the column headers.
What I want to be able to do is write out the unique values for each attribute field, using the attribute names as the column headers.
Solved! Go to Solution.
I'm working on something similar. I'm writing the items that I want to a variable. I then check to see it it's unique, and append it to my list if not. the I sort the list. then I write it to my .csv. In short...
outFile = r"d:/_dataTest/myoutput.csv" f = open(outFile, "w") print ("File is open: " + str(not f.closed)) f.write("Type, TOCRef, FullPath, newPathn") uniqueList = [] for root, dirs, files in os.walk(rootDir): if something: #do stuff and create various anItem's anItem = ("blah", "blah", "blah") # creates an item with the fields or variable values I want if anItem not in uniqueList: # created tuple list of uniq combinations uniqueList.append(anItem) #appends the item only if it is completely unique #once all is done and everything is appended and unique uniqueList.sort() # sorts the list, order Type, Path, TOC for item in uniqueList: # creating output as Type, TOC, Path f.write("{},{}, {}\n".format(item[0], item[2], item[1])) f.close()
I trimmed this from a much larger script so obviously it will not run as is. Hope this helps.
Numpy has tools to find unique values in files. This example gives you a sniff at what you can do. When working with tabular data, use numpy, it is way faster and there is more that can be done...and yes...it even works with columns.
The example below simply finds the unique values in two columns, one integer, the other string....
import numpy as np np.set_printoptions(edgeitems=4,linewidth=60,precision=2,suppress=True,threshold=10) X = [6.0,7,8,3,6,2,9,2,8,3] Y = [0.0,9,6,2,1,5,4,1,4,5] Group = [4,2,1,4,3,2,2,3,4,1] Text = ['a','c','b','a','a','b','a','c','d','b'] dt = [('X','f8'),('Y','f8'),('Group','i4'),('Text','|S5')] arr = np.array(zip(X,Y,Group,Text),dtype=dt) print('\nInput array: X,Y,Group,Text,\n{}'.format(arr)) for col in ['Group','Text']: arr_where,idx= np.unique(arr[col],return_index=True) print('Unique results==> Column: {}\n Unique: {}\n Indices: {}'.format(col,arr_where,idx))
give it whirl
Is this what you're looking for? It might help to see an example.
Input table:
PrimaryKey | SomeAttr | OtherSomething | NewVal |
---|---|---|---|
123 | ABC | LOW | 6.123 |
456 | ABC | MED | 15.456 |
789 | ABC | HIGH | 24.789 |
321 | CBA | LOW | 6.321 |
654 | CBA | MED | 15.654 |
Output CSV:
PrimaryKey | SomeAttr | OtherSomething | NewVal |
---|---|---|---|
123, 456, 789, 321, 654 | ABC, CBA | LOW, MED, HIGH | 6.123, 6.321, 15.456, 15.654, 24.789 |
using my example above, just add the code
Groups = (np.unique(arr['Group'])).tolist() Classes = (np.unique(arr['Text'])).tolist() print('Unique groups: {}\nUnique text: {}'.format(Groups,Classes))
Gives
Unique groups: [1, 2, 3, 4]
Unique text: ['a', 'b', 'c', 'd']
which you could obviously add to a field or dictionary or whatever
Correct, except I'm trying to write it out each unique value in sucessive rows, one value per cell, instead of all on the same row(multiple values in the same cell).
Here's my solution.
def main(): import arcpy import os import csv import itertools import collections # Local variables workingDir = r"N:\TechTemp\BlakeT\Work" workingGDB = os.path.join(workingDir, "TEMP.gdb") inputTable = os.path.join(workingGDB, "TestingFc") outputCSV = os.path.join(workingDir, "MyOutput.csv") # Get list of usable field names (exclude OID and SHAPE fields) fieldNames = [ f.name for f in arcpy.ListFields(inputTable) if f.type != "Geometry" and f.editable == True ] # Populate dictionary with field names and their distinct values fieldVals_Distinct = collections.OrderedDict() for field in fieldNames: print("Processing {}".format(field)) ## Get distinct field values and populate dictionary sqlprefix = "DISTINCT {}".format(field) sqlpostfix = "ORDER BY {}".format(field) fieldVals_Distinct[field] = [ row[0] for row in arcpy.da.SearchCursor( inputTable, [field], sql_clause=(sqlprefix, sqlpostfix) ) ] # Create CSV ## Using the w mode will overwrite an existing file or create a new one with open(outputCSV, "w") as csvfile: csvwriter = csv.writer(csvfile, delimiter=',', lineterminator='\n') ## Write field name header line csvwriter.writerow(fieldVals_Distinct.keys()) ## Zip distinct values together for all fields and write to CSV fieldVals_zipped = itertools.izip_longest( *fieldVals_Distinct.values(), fillvalue=None ) csvwriter.writerows(list(fieldVals_zipped)) if __name__ == '__main__': main()
The ordered dictionary may not be necessary, but from my initial testing, it sometimes changed the order of stuff when zipping all of the values together at the end. Thank you to Dan Patterson for the idea to use zip (or zip_longest in this case).
Blake
Very nice job...!!!
I will put forth this little ditty in the hopes that you can see that a search cursor can be completely replaced if desired and everything brought into numpy by using TableToNumPyArray (and its reverse equivalent).
The example produces 3 columns of data... ID,Single,Double with ID being...well...an ID field and the last 2 fields just random letters. I find it easier to use data that I can control for testing. I do like the use of the OrderedDict and Counter. Anyhow, have some fun with this. I will post a follow up on my blog that incorporates time testing, but I thought this would amuse you and others in the interim.
Dan
import numpy as np import string import random from collections import Counter, OrderedDict np.set_printoptions(edgeitems=4,linewidth=60,precision=2,suppress=True,threshold=10) # testing stuff................................................... classes = string.uppercase[:5] # first 5 letters of the alphabet N = 100 col1 = [random.choice(classes) for i in range(N)] col2 = [random.choice(classes)*2 for i in range(N)] ID = np.arange(N) data = zip(ID,col1,col2) arr = np.array(data,dtype=[("ID","int32"),("Single","|S5"),("Double","|S5")]) #................................................................. # real stuff ... see # arr = TableToNumPyArray(in_table,field_names,{where_clause}, # {skip_nulls},{null_value}) # print("\nInput array:\n{}\ndtype: {}".format(arr,arr.dtype)) fields = arr.dtype.names # array field names for f in fields: if arr.dtype.char == 'S': # could include other field types@ c = Counter(arr ) keys = c.keys() od = OrderedDict((x,c ) for x in keys) print("\nField: {}\n Counter: {}\n OrderedDict: {}".format(f,c,od)) else: print("\nField: {}, type: {} not string type... darn ...".format(f,arr .dtype))
Thank you, Blake and Dan! This helps a lot!
I can see the utility of NumPy, but I just haven't learned it yet. I also did not know about Counter. Every little bit helps in learning these new things. Thank you for posting the follow-up Dan Patterson!
Blake...there is some older stuff here from Aug 2014...I am currently updating stuff, so it isn't uptodate
Collections in numpy: producing frequency distributions and graphing