I am using ArcGIS Pro 2.3 and Python 3.6
My goal is to extract a column of numeric data from my geodatabase and analyze it using numpy.
However I am not sure how to open a specific column in a geodatabase and make it available to analyze the data statistically. Typically I use PyODBC to query a database and then use numpy on that data. Enclosed at the bottom is the Python code for what I am hoping to achieve using PyODBC. But Geodatabases and arcpy are not my expertise.
Here is my pseudocode:
-) Use Arcpy to add a column and calculate values for that column
-.) Use Arcpy to open up the geodatabase and select that specific column.
-.) Use Numpy or Scipy on that column to generate descriptive statistics such as mean, median, mode, percentiles, etc. (NOTE: Pandas would also be a good option.)
-) After calculating statistics on this column, create a histogram,
-) further statistical analysis once I get past this road block.
The part I am stuck on is how to bridge the gap from the geodatabase and select out a specific column and analyze that data as a dataframe or using cursor.
Can you please point the way to solving this problem?
Shown below is the code I currently have which adds a response time column and calculates the fire truck response times. My next step is to analyze the data.
There is another post similar - but not quite what I am attempting to do.
This is what I currently have started:
# Create a column in the geodatabase that is numeric
# update that column with an equation DTMArriveOnScene - DTMDispatch in seconds
# Use the number of seconds of fire truck response times to generate descriptive statistics
# Use the number of seconds of fire truck response times for a histogram
# Write this information out to an html page so the user can have a report
import arcpy
# Set environment settings
arcpy.env.workspace = "C:/Users/kulpandm/Documents/ArcGIS/Projects/CFD/CFD.gdb"
# Set local variables
inTable = "CincinnatiCFD2"
fieldName = "ResponseT"
fieldPrecision = 9
fieldAlias = "Response Time"
# Execute AddField
##arcpy.AddField_management(inTable, fieldName, "LONG", fieldPrecision, field_alias=fieldAlias, field_is_nullable="NULLABLE")
## make a calculation for the response times
expression = "DateDiff($feature.DTMArriveOnScene, $feature.DTMDispatch,'seconds')"
# Execute CalculateField
arcpy.CalculateField_management(inTable, fieldName, expression, "ARCADE")
Shown above I have added a column and calculated the data. Now I want to analyze it using Numpy. this is the part where I am stuck.
Shown below is an example of how I solved this issue before using PyODBC and Pandas.
-*- coding: utf-8 -*-
## David Kulpanowski
## 27 July 2018
## This is an example of how I do this using SQL Server and Python.
import pyodbc
import pandas as pd
import time
#import scipy
import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy
#from scipy.stats import mannwhitneyu
#from scipy.stats import kruskal
#from statsmodels.graphics.gofplots import qqplot
#from matplotlib import pyplot
StartTime = time.clock()
cnxn = pyodbc.connect('DRIVER={SQL Server}; SERVER=MyServer; DATABASE=MyDatabase; Trusted_Connection=yes' )
cursor = cnxn.cursor()
Fires = pd.read_sql("""
SELECT ResponseTime
FROM FireTruckDetails
;
""", cnxn)
##print(Fires.head(n = 20))
##night.to_csv('c:/temp/FiresResponseTime.csv', sep = ',')
cursor.close()
cnxn.close()
#descriptive statistics
print('The current statistics ')
Fires.ResponseTime.describe()
fires_counts= Fires.ResponseTime.count()
print(fires_counts)
####
## create a histogram for the current time frame for all delta echo responses
####
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_ylim(ymin=0, ymax=500)
plt.hist(current.ResponseTime, bins=numpy.arange(0, 900, 60), range=[0, 900], cumulative=False, color='Crimson', label='Fire Truck Response times', alpha=0.75)
plt.title('Current time frame responses')
plt.xticks(numpy.arange(0, 900, 60), rotation='vertical')
plt.savefig('C:/Temp/FiresHistogram.svg', format = 'svg', orientation = 'landscape')
plt.show()
plt.clf()