Select to view content in your preferred language

How to calculate multiple equations (a+bX+cY+dZ), where (a, b, and c) are regression coeficients and (X, Y and Z) are rasters, using Pyton?

7104
35
Jump to solution
09-03-2018 01:23 PM
LuizVianna
Regular Contributor

I need to run 252 equations to calculate freeze probability using lat, lon, and altitude as raster inputs, multiplied by a, b, c and d regression coeficients. The equations are in format a+b*lat+c*lon+d*alt.

The equations are in an Excel spreedsheet (attached) and I need to write them in a python script. The lat, lon and lat raster datasets are in the same folder and the output tif files have a standardized name (also in the Excel file).

Thank you!

35 Replies
DanPatterson_Retired
MVP Emeritus

Håkon

I used the xlsx, to make sure it was formatted properly.

I could have used Excel to Table, from there, but I exported to *.csv so I could check the contents directly.  There is little that can remain hidden from a text file.

From there, np.genfromtext to produce the structured array in numpy to create the array for the equation bits. 

You could try Excel To Table... maybe the problem is there if it was used to create the gdb  (  I didn't even look at the gdb ? maybe didn't see it). 

I don't trust excel (thousands of posts experience) and I always look at what is sent in case something is wrong with the source.  Excel to Table does a pretty good job now for most tasks but to be safe, I usually export to *.csv rather than use xlsx or their ilk directly.

As for raster input, most times, I just convert them to arrays to keep everything in one format.  Back out *.tif using NumPyArrayToRaster or tifffile or one of the others

0 Kudos
HåkonDreyer
Esri Contributor

Guess that explains why you got <u255, but to see why his code crashed at arr = arcpy.da.TableToNumPyArray(tbl, "*") we had to look at his own gdb, and it had a 'broken' schema.

0 Kudos
DanPatterson_Retired
MVP Emeritus

Strange... How would you check for that in a file geodatabase?

An overview of the Geodatabase Administration toolset—Data Management toolbox | ArcGIS Desktop has nothing (note, the help topic is duplicated in the TOC)

nor does this

An overview of the File Geodatabase toolset—Data Management toolbox | ArcGIS Desktop 

Is there some analysis tool or is it just 'broken'

0 Kudos
HåkonDreyer
Esri Contributor

The schema in the GDB isn't broken per se, that's why I put exclamation marks around 'broken'. The GDB has no problems with the schema, but it breaks the conversion to numpy array as the np.array can't handle a dtype of <u2147483647.

If he instead had read the data directly from the GDB with a cursor the problem hadn't occured.

0 Kudos
DanPatterson_Retired
MVP Emeritus

But how did it get that dtype?  I have never seen direct conversion of file gdb featureclasses to structured arrays or even rasters to ndarrays have issues before.  If it something new that is now broken, at least I can keep an eye open for it.

Assuming that 'u' is for unsigned integer as well

array([0, 1, 2], dtype=uint32)

a.dtype.descr
[('', '<u4')]

a = np.arange(3, dtype=np.uint64)

a.dtype.descr
[('', '<u8')]‍‍‍‍‍‍‍‍‍

Because Unicode should have blown up with a memory error

dt = np.dtype([('', '<U2147483647')])

u_a = np.array(['a', 'b', 'c'], dtype=dt)

Traceback (most recent call last):

  File "<ipython-input-16-022c2659e129>", line 1, in <module>
    u_a = np.array(['a', 'b', 'c'], dtype=dt)

MemoryError
0 Kudos
HåkonDreyer
Esri Contributor

Guess your second example demonstrates just what happens, unfortunately the tool just exits instead of giving a error message, but I have managed to get the error to exit python without giving an error message too so might be a numpy problem.

>>> dt = np.dtype([('name', np.unicode_, 2147483647), ('grades', np.float64, (2,))])
>>> dt['name']
dtype('<U-1')
>>> x = np.array([('Sarah', (8.0, 7.0)), ('John', (6.0, 7.0))], dtype=dt)
>>> x
(arcgis15) C:\

Anyway, the dtype conversion is as documented. See the Type conversions part here Working with NumPy in ArcGIS—ArcPy Get Started | ArcGIS Desktop, and as long as you are aware of your schema the problem shouldn't be too hard to avoid.

0 Kudos
DanPatterson_Retired
MVP Emeritus

yes.. never seen  <U-1 for a dtype before... it crashed the spyder kernel for sure

0 Kudos
LuizVianna
Regular Contributor

Thank you, Hakon!

The size of text fields seems to be the problem! I re-imported the xlsx table (as csv) and re-sized the text fields to 255 and solved the array problem.

However the equation (arrs) is not running:

import numpy as np
import arcpy
arcpy.CheckOutExtension("spatial")
arcpy.env.overwriteOutput = True
lat = np.arange(-29.373511, -25.928348, 0.000871310821466946, 'float')
lon = np.arange(-53.915588, -48.273851, 0.000871310821466947, 'float')
lon, lat = np.meshgrid(lon, lat)
alt = r"D:\Vianna\Dados\Raster\Topografia\mde_sc_90.tif"
tbl = r"D:\Vianna\Dados\Raster\Clima\Raster_Clima.gdb\TB_EQUAC_ETP"
fields=['SG_PREFIX','SG_SUFIX','SG_PERIO','SG_FORMA','VL_A','VL_B','VL_C','VL_D']
arr = arcpy.da.TableToNumPyArray(tbl, fields, skip_nulls=True)
arr
array([('etp', '_', 'm01', '.tif',    7.5685,   1.61406,  -3.39528, -0.03605),
       ('etp', '_', 'm02', '.tif',   66.3169,   2.0214 ,  -2.07062, -0.03056),
       ('etp', '_', 'm03', '.tif',  123.8896,   2.44179,  -1.03483, -0.02871),
       ('etp', '_', 'm04', '.tif',  178.6612,   2.52436,   0.6756 , -0.01879),
       ('etp', '_', 'm05', '.tif',  152.986 ,   1.48649,   1.1555 , -0.012  ),
       ('etp', '_', 'm06', '.tif',  126.2742,   1.22244,   1.07461, -0.00638),
       ('etp', '_', 'm07', '.tif',  107.1097,   1.69144,   0.45795, -0.00522),
       ('etp', '_', 'm08', '.tif',   78.4072,   2.02199,  -0.45745, -0.00556),
       ('etp', '_', 'm09', '.tif',   40.4517,   1.63276,  -1.21244, -0.00851),
       ('etp', '_', 'm10', '.tif',   43.7457,   2.8771 ,  -2.2587 , -0.01466),
       ('etp', '_', 'm11', '.tif',   19.1544,   2.43133,  -2.86775, -0.02095),
       ('etp', '_', 'm12', '.tif',    1.5875,   2.37287,  -3.70006, -0.0303 ),
       ('etp', '_', 'anual', '.tif',  834.3966,  24.28193, -15.89816, -0.22546)],
      dtype=[('SG_PREFIX', '<U255'), ('SG_SUFIX', '<U255'), ('SG_PERIO', '<U255'), ('SG_FORMA', '<U255'), ('VL_A', '<f8'), ('VL_B', '<f8'), ('VL_C', '<f8'), ('VL_D', '<f8')])
fns = ["{}{}{}{}".format(*i) for i in arr[['SG_PREFIX', 'SG_SUFIX', 'SG_PERIO', 'SG_FORMA']]]
a = arr['VL_A']
b = arr['VL_B']
c = arr['VL_C']
d = arr['VL_D']
arrs = [a[i] + b[i]*lat + c[i]*lon + d[i]*alt for i in range(len(a))]
LL = arcpy.Point(lon[0,0], lat[-1,0])  # magical slicing
outname = "D:\\Temp\\{}".format(fns[2])
outraster = arcpy.NumPyArrayToRaster(arrs[2], LL, 1., 1.)  # last 2 terms are cell size
outraster = arcpy.sa.ExtractByMask(outraster, alt)
outraster.save(outname)
Traceback (most recent call last):
  File "<string>", line 6, in <module>
  File "<string>", line 6, in <listcomp>
TypeError: 'numpy.float64' object cannot be interpreted as an integer
fns
['etp_m01.tif', 'etp_m02.tif', 'etp_m03.tif', 'etp_m04.tif', 'etp_m05.tif', 'etp_m06.tif', 'etp_m07.tif', 'etp_m08.tif', 'etp_m09.tif', 'etp_m10.tif', 'etp_m11.tif', 'etp_m12.tif', 'etp_anual.tif']
a
array([   7.5685,   66.3169,  123.8896,  178.6612,  152.986 ,  126.2742,
        107.1097,   78.4072,   40.4517,   43.7457,   19.1544,    1.5875,
        834.3966])
arrs
Traceback (most recent call last):
  File "<string>", line 1, in <module>
NameError: name 'arrs' is not defined
arrs = [a[i] + b[i]*lat + c[i]*lon + d[i]*alt for i in range(len(a))]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 1, in <listcomp>
TypeError: 'numpy.float64' object cannot be interpreted as an integer
0 Kudos
HåkonDreyer
Esri Contributor

Guess your problem here is that you has forgot to convert alt to a raster, so in line 30 it's a string.

A working copy from the data you added earlier:

import arcpy

arcpy.CheckOutExtension("Spatial")
arcpy.env.overwriteOutput = True

WKSP = "C:/temp/datarar" # Folder with raster datasets.
arcpy.env.workspace = WKSP

# Input raster datasets of the same masked area and same spatial reference
# Get lower left corner and cell size to georeference output
latraster = arcpy.Raster("latit.tif")
lowerleft = arcpy.Point(latraster.extent.XMin, latraster.extent.YMin)
cellsize = latraster.meanCellWidth
lat = arcpy.RasterToNumPyArray(latraster) # Input lat raster
del latraster
lon = arcpy.RasterToNumPyArray("longit.tif") # Input lon raster
alt = arcpy.RasterToNumPyArray("mde_sc_90.tif") # Input dem raster

eqtbl = "Raster_Clima.gdb/TB_EQUAC_ETP2"
arr = arcpy.da.TableToNumPyArray(eqtbl, "*")

filenames = ["{}_{}.tif".format(*x) for x in arr[["SG_PREFIX", "SG_PERIO"]]]

a = arr["VL_A"]
b = arr["VL_B"]
c = arr["VL_C"]
d = arr["VL_D"]

# Working with large arrays this may be memory intensive.
# arrs = [a + b * lat + c * lon + d * alt for i in range(len(a))]

i = 0
for filename in filenames:
print(filename)
# A bit more memory friendly alternative to arrs
eqarr = a + b * lat + c * lon + d * alt
eqraster = arcpy.NumPyArrayToRaster(eqarr, lowerleft, cellsize, cellsize)
eqraster.save(filename)
i += 1
0 Kudos
LuizVianna
Regular Contributor

The script is now ok. I´m running it using lat, lon and alt inputs as rasters (tif). The line 28 (eq) is running ok and generating the output raster in scratch workspace.

eq = [a[i] + b[i] * lat + c[i] * lon + d[i] * alt for i in range(len(a))]
eq
[D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_48988a4b_b545_4542_a1a6_2033bd3fdf61_70285720,
 D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_83029179_cc6b_4cff_9083_01c3a6d2b97a_70285720,
 D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_7c292bb1_3754_43d7_8e61_75ba5011864e_70285720,
 D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_7d559a2b_32f6_4526_8e42_9909cc8959f7_70285720,
 D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_bcc6da9d_3189_489a_9151_0dcbd3315338_70285720,
 D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_331494a3_5e1d_493f_8fd1_78f11f674f4d_70285720,
 D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_c81ada51_8de1_47e0_a299_f30e15ab984c_70285720,
 D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_056e14b0_2764_40bd_855f_f993f623fd08_70285720,
 D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_35056ba8_41ba_4e19_a036_e7b430f1ece5_70285720,
 D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_beb5dc22_f5cc_4e9a_9c46_063826a82a1a_70285720,
 D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_55a08d44_9db9_446d_aa61_94db00b56cec_70285720,
 D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_92a899db_de84_4e92_a942_71bba78a429e_70285720,
 D:\Vianna\Projetos\Vitivinicultura\APTX\MCA\MCA.gdb\plus_dc217a38_13a4_453c_ae0a_1f5efe4703bd_70285720]
‍‍‍

However I don´t know how to record the output files as tif in the defined workspace (line 6). I think the save command in line 29 is wrong.

import numpy as np
import arcpy
arcpy.CheckOutExtension("Spatial")
arcpy.env.overwriteOutput = True

wksp = "D:\Vianna\Dados\Raster\Clima\ETP"
arcpy.env.workspace = wksp

lat = arcpy.Raster("D:\Vianna\Dados\Raster\Topografia\latit.tif")
lon = arcpy.Raster("D:\Vianna\Dados\Raster\Topografia\longit.tif")
alt = arcpy.Raster("D:\Vianna\Dados\Raster\Topografia\mde_sc_90.tif")
tbl = "D:\Vianna\Dados\Raster\Clima\Raster_Clima.gdb\TB_EQUAC_ETP"
arr = arcpy.da.TableToNumPyArray(tbl, "*")
eqs = arr['DS_EQUAC']
filenames = ["{}{}{}{}".format(*i) for i in arr[['SG_PREFIX', 'SG_SUFIX', 'SG_PERIO', 'SG_FORMA']]]
a = arr['VL_A']
b = arr['VL_B']
c = arr['VL_C']
d = arr['VL_D']

#eq = [a + b*lat + c*lon + d*alt for i in range(len(a))]
#outname = "D:\\Vianna\\Dados\\Raster\\Clima\\ETP\\{}".format(eq[2])
#fns.save(fns)

for filename in filenames:
 print(filename)
 # A bit more memory friendly alternative to arrs
 eq = [a[i] + b[i] * lat + c[i] * lon + d[i] * alt for i in range(len(a))]
 eq.save(eq)
etp_m01.tif

Traceback (most recent call last):
  File "<string>", line 29, in <module>
AttributeError: 'list' object has no attribute 'save'
0 Kudos