Have you every come across a situation like one of these:
- you need to test out something but don't have the data
- are you sick of trying to get a function to work in the field calculator
- you want to test out one of ArcMap's functions but none of your data are suitable
- all I need are some points which have a particular distribution
- someone forgot to post a sample of their data on GeoNet for testing and you don't have a match
- you forgot to collect something in the field
Well, this lesson is for you. It is a culmination of a number of the previous lessons and a few
NumPy Snippets and Before I Forget posts. I have attached a script to this post below
There is also a GitHub repository that takes this one step further providing more output options... see Silly on GitHub
The following provides the basic requirements to operate a function should you choose not to
incorporate the whole thing. Obviously, the header section enclosed within triple quotes
isn't needed but the import section is.
"""
:Script: random_data_demo.py
:Author: Dan.Patterson AT carleton.ca
:Modified: 2015-08-29
:Purpose:
: Generate an array containing random data. Optional fields include:
: ID, Shape, text, integer and float fields
:Notes:
: The numpy imports are required for all functions
"""
from functools import wraps
import numpy as np
import numpy.lib.recfunctions as rfn
np.set_printoptions(edgeitems=5, linewidth=75, precision=2,
suppress=True,threshold=5,
formatter={'bool': lambda x: repr(x.astype('int32')),
'float': '{: 0.2f}'.format})
str_opt = ['0123456789',
'!"
'abcdefghijklmnopqrstuvwxyz',
'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
]
def func_run(func):
"""Prints basic function information and the results of a run.
:Required: from functools import wraps
"""
@wraps(func)
def wrapper(*args,**kwargs):
print("\nFunction... {}".format(func.__name__))
print(" args.... {}\n kwargs.. {}".format(args, kwargs))
print(" docs.... \n{}".format(func.__doc__))
result = func(*args, **kwargs)
print("{!r:}\n".format(result))
return result
return wrapper
Before I go any further, lets have a look at the above code.
- line 14 - functools wraps module - I will be using decorators to control output and wraps handles all the fiddly stuff in decorators (see Before I Forget # 14)
- line 16 - - numpy.lib.recfunctions is a useful module for working with ndarrays and recarrays in particular...it is imported as rfn
- lines 17-20 - np.set_printoptions allows you to control how arrays are formatted when printing or working from the command line. Most of the parameters are self-explanatory or you will soon get the drift
- lines 30 - 43 - the decorator function presented in BIF # 14.
Now back to the main point. If you would like to generate data with some control on the output.
This will present some functions to do so and put it together into a standalone table or feature class.
An example follows:
Array generated....
array([(0, (7.0, 1.0), 'E0', '(0,0)', 'A', 'ARXYPJ', 'cat', 'Bb', 0, 9.380410289877375),
(1, (2.0, 9.0), 'D0', '(4,0)', 'B', 'RAMKH', 'cat', 'Aa', 9, 1.0263298179133362),
(2, (5.0, 8.0), 'C0', '(1,0)', 'B', 'EGWSC', 'cat', 'Aa', 3, 2.644448491753841),
(3, (9.0, 7.0), 'A0', '(1,0)', 'A', 'TMXZSGHAKJ', 'dog', 'Aa', 8, 6.814471938888746),
(4, (10.0, 3.0), 'E0', '(1,0)', 'B', 'FQZCTDEY', '-1', 'Aa', 10, 2.438467639965038)],
............. < snip >
dtype=[('ID', '<i4'), ('Shape', [('X', '<f8'), ('Y', '<f8')]),
('Colrow', '<U2'), ('Rowcol', '<U5'), ('txt_fld', '<U1'),
('str_fld', '<U10'), ('case1_fld', '<U3'), ('case2_fld', '<U2'),
('int_fld', '<i4'), ('float_fld', '<f8')])
Here are the code snippets...
Code snippets |
def pnts_IdShape(N=10, x_min=0, x_max=10, y_min=0, y_max=10):
""" Create an array with a nested dtype which emulates a shapefile's
: data structure. This array is used to append other arrays to enable
: import of the resultant into ArcMap. Array construction, after hpaulj
: http://stackoverflow.com/questions/32224220/
: methods-of-creating-a-structured-array
"""
Xs = np.random.randint(x_min, x_max, size=N)
Ys = np.random.randint(y_min, y_max, size=N)
IDs = np.arange(0, N)
c_stack = np.column_stack((IDs, Xs, Ys))
if simple:
dt = [('ID', '<i4'),('Shape', '<f8', (2,))]
a = np.ones(N, dtype=dt)
a['ID'] = c_stack[:, 0]
a['Shape'] = c_stack[:, 1:]
else:
dt = [('ID', '<i4'), ('Shape', ([('X', '<f8'),('Y', '<f8')]))]
a = np.ones(N, dtype=dt)
a['Shape']['X'] = c_stack[:, 1]
a['Shape']['Y'] = c_stack[:, 2]
a['ID'] = c_stack[:, 0]
return a
|
def colrow_txt(N=10, cols=2, rows=2, zero_based=True):
""" Produce spreadsheet like labels either 0- or 1-based.
:N - number of records/rows to produce.
:cols/rows - this combination will control the output of the values
:cols=2, rows=2 - yields (A0, A1, B0, B1)
: as optional classes regardless of the number of records being produced
:zero-based - True for conventional array structure,
: False for spreadsheed-style classes
"""
if zero_based:
start = 0
else:
start = 1; rows = rows + 1
UC = (list("ABCDEFGHIJKLMNOPQRSTUVWXYZ"))[:cols]
dig = (list('0123456789'))[start:rows]
cr_vals = [c + r for r in dig for c in UC]
colrow = np.random.choice(cr_vals,N)
return colrow
Yields
array(['D0', 'E0', 'C0', 'E0', 'C0', 'C0', 'D0', 'D0', 'E0', 'D0'],
dtype='<U2')
|
def rowcol_txt(N=10,rows=2,cols=2):
""" Produce array-like labels in a tuple format.
"""
rc_vals = ["({},{})".format(r, c) for c in range(cols) for r in range(rows)]
rowcol = np.random.choice(rc_vals, N)
return rowcol
Yields
array(['(2,0)', '(2,0)', '(4,0)', '(0,0)', '(4,0)', '(2,0)', '(4,0)',
'(0,0)', '(2,0)', '(0,0)'],
dtype='<U5')
|
def rand_text(N=10,cases=3,vals=str_opt[3]):
""" Generate N samples from the letters of the alphabet denoted by the
: number of cases. If you want greater control on the text and
: probability, see rand_case or rand_str.
:
: vals: see str_opt in required constants section
"""
vals = list(vals)
txt_vals = np.random.choice(vals[:cases],N)
return txt_vals
Yields
array(['C', 'C', 'C', 'B', 'A', 'B', 'A', 'C', 'C', 'C'],
dtype='<U1')
|
def rand_str(N=10,low=1,high=10,vals=str_opt[3]):
""" Returns N strings constructed from 'size' random letters to form a string
: - create the cases as a list: string.ascii_lowercase or ascii_uppercase etc
: - determine how many letters. Ensure min <= max. Add 1 to max alleviate low==high
: - shuffle the case list each time through loop
"""
vals = list(vals)
letts = np.arange(min([low,high]),max([low,high])+1)
result = []
for i in range(N):
np.random.shuffle(vals)
size = np.random.choice(letts, 1)
result.append("".join(vals[:size]))
result = np.array(result)
return result
Yields
array(['ZDULHYJSB', 'LOSZJNB', 'PKECZOIJ', 'ZV', 'DENCBP', 'XRNITEJ',
'HJMDLBNSEF', 'DWLYPQF', 'HZOUTBSLN', 'MOEXR'],
dtype='<U10')
|
def rand_case(N=10,cases=["Aa","Bb"],p_vals=[0.8,0.2]):
""" Generate N samples from a list of classes with an associated probability
: ensure: len(cases)==len(p_vals) and sum(p_values) == 1
: small sample sizes will probably not yield the desired p-values
"""
p = (np.array(p_vals))*N
kludge = [np.repeat(cases[i], p[i]).tolist() for i in range(len(cases))]
case_vals = np.array([val for i in range(len(kludge)) for val in kludge[i]])
np.random.shuffle(case_vals)
return case_vals
Yields
array(['cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'cat', 'dog', 'cat',
'fish'],
dtype='<U4')
array(['Aa', 'Bb', 'Aa', 'Aa', 'Aa', 'Aa', 'Bb', 'Aa', 'Aa', 'Aa'],
dtype='<U2')
|
def rand_int(N=10,begin=0,end=10):
""" Generate N random integers within the range begin - end
"""
int_vals = np.random.randint(begin,end,size=(N))
return int_val
Yields
array([7, 1, 4, 1, 6, 4, 5, 2, 2, 2])
|
def rand_float(N=10,begin=0,end=10):
""" Generate N random floats within the range begin - end
Technically, N random integers are produced then a random
amount within 0-1 is added to the value
"""
float_vals = np.random.randint(begin,end-1,size=(N))
float_vals = float_vals + np.random.rand(N)
return float_vals
Yield
array([ 8.40, 9.09, 0.90, 9.64, 8.63, 5.05, 2.07, 8.13, 9.91, 0.22])
|
The above functions can be used with the main portion of the script and your own function.
Sample function |
def blog_post():
"""sample run"""
N = 10
id_shape = pnts_IdShape(N,x_min=300000,x_max=300500,y_min=5000000,y_max=5000500)
case1_fld = rand_case(N,cases=['cat','dog','fish'],p_vals=[0.6,0.3,0.1])
int_fld = rand_int(N,begin=0,end=10)
fld_names = ['Pets','Number']
fld_data = [case1_fld,int_fld]
arr = rfn.append_fields(id_shape,fld_names,fld_data,usemask=False)
return arr
if __name__ == '__main__':
"""create ID,Shape,{txt_fld,int_fld...of any number}
"""
returned = blog_post()
|
array([(0, (300412.0, 5000473.0), 'dog', 4),
(1, (300308.0, 5000043.0), 'cat', 4),
(2, (300443.0, 5000170.0), 'dog', 5),
(3, (300219.0, 5000240.0), 'cat', 0),
(4, (300444.0, 5000067.0), 'cat', 9),
(5, (300486.0, 5000106.0), 'cat', 3),
(6, (300242.0, 5000145.0), 'cat', 5),
(7, (300038.0, 5000341.0), 'dog', 7),
(8, (300335.0, 5000495.0), 'cat', 9),
(9, (300345.0, 5000108.0), 'fish', 7)],
dtype=[('ID', '<i4'), ('Shape', [('X', '<f8'), ('Y', '<f8')]),
('Pets', '<U4'), ('Number', '<i4')])
|
You will notice in the above example that the rand_case function was to determine
the number of pets based upon p-values of 0.6, 0.3 and 0.1, with cats being favored, as they should be, and
this is reflected in the data. The coordinates in this example were left as integers, reflecting a 1m resolution.
It is possible to add a random pertubation of floating point values in the +/- 0.99 to add centimeter values if you desire.
This is not shown here, but I can provide the example if needed.
The 'Number' field in this example simply reflects the number of pets per household.
Homework...
Using NumPyArrayToFeatureclass, create a shapefile using a NAD_1983_CSRS_MTM_9 projection
(Projected, National Grids, Canada, NAD83 CSRS_MTM_9)
Answer...
>>> import arcpy
>>> a = blog_post()
>>>
>>>
>>> SR_name = 32189
>>> SR = arcpy.SpatialReference(SR_name)
>>> output_shp ='F:/Writing_Projects/NumPy_Lessons/Shapefiles/out.shp'
>>> arcpy.da.NumPyArrayToFeatureClass(a, output_shp, 'Shape', SR)
Result
That's all...