Using Python to automate task in excel

4107
9
01-25-2016 01:42 PM
nataliemartinkus1
Occasional Contributor

Hello GIS community,

I have a spreadsheet exported from a shapefile attribute table of Fores_Ownership_by_County for each of 4 states in the Pacific Northwest.  It is a very large spreadsheet (lots of rows), with the data representing small polys of forested area in each county. Of the many columns, there is a column for County Name (called instName below) and one for Owner Type (U.S. Forest Service, Private, State, or Non-USFS - Fed (Other Public-Fed below)).  I also have a spreadsheet listing the harvested forest biomass density for each county by ownership, see below, one for the county name and a density listed for each of the 4 ownership types. I would like to write a Python code to automate this task of looking in the density sheet for the county name and owner type, and reporting the density in the large sheet, but I am new to Python and am stuck. Any help would be appreciated. Especially if there is a way to write the program to run in GIS and populate the Forest Ownership by County attribute table directly.   Below are two snapshots of the two worksheets I am talking about:

Forest Ownership by County worksheet:

Density Worksheet:

Many thanks,

natalie

0 Kudos
9 Replies
DanPatterson_Retired
MVP Emeritus

If this is a pure spreadsheet issue, I would suggest using the macro languages that exist within the spreadsheet.  There is no point in bringing GIS into this nor python.  Should you wish to get out of a spreadsheet environment, then use a database.  If you do have a need for the mapping component, then the same recommendation would apply.  If you really really want to use python, then I would recommend looking at 2nd party python interfaces to database programs.  Pandas works well with Python which works well with GIS (arcgis and others).  So your solution is based upon the trajectory you want to take.  Reorganization of your workflow might be a suggestion to go from comfortable to efficient. 

nataliemartinkus1
Occasional Contributor

My end goal is to create of map of forested density by county.  I don't know how to write macros in excel, and I am in the process of learning Python so I'd prefer to stick to one language for now.  I just completed a Coursera class on databases, and I see your point that I could create a relationship between the two spreadsheets based on county name, but again I am new to databases.  I also know I can write a very long Python script using if-elif-else logic which would work, but would take a lot of time to write. 

Do you know of any good reference material for learning how to work with excel tables in geodatabases to manipulate attribute tables?  I am graduating with my PhD in Civil Engineering this summer, and my end product needs to be a GIS tool for biorefinery siting. My data consists of a mix of Network Analyst network datasets, points representing pulp mills, and a few workbooks.  I assume I will either go the ModelBuilder route or Python route, but in either case I need to understand how all the disparate datasets can work together to come up with a great answer.

0 Kudos
DarrenWiens2
MVP Honored Contributor

Some options within ArcGIS are:

- convert your Excel(s) to table(s), then process as usual

- work directly with your Excel sheets, using the xlrd library (import xlrd) packaged with ArcGIS - this is how ESRI accesses Excel within geoprocessing tools (look for yourself in: c:\program files (x86)\arcgis\desktop10.2\ArcToolbox\Scripts\ExcelToTable.py)

DanPatterson_Retired
MVP Emeritus

Ok... game plan is clear.  Might I suggest you looking into Modelbuilder and Python for automating your workflow and a list of the tools that you might use from Arctoolbox in that workflow.  So here are some basic links.  The sublinks will keep you going for quite some time.  As I recommend to my students, keep a subfolder of links related to this project in your browser so you can create a tree of useful links.

The key is the code snippets which will get you going.  If sounds like you are going to need basic query tools, tools for joining and calculating and the like.  Good luck

nataliemartinkus1
Occasional Contributor

I looked at Panda and it seems to be pretty user friendly.  One last question - is Panda a RDBMS like PostgreSQL, one that is friendly with geodatabases?

0 Kudos
DanPatterson_Retired
MVP Emeritus

Natalie... I don't do that database stuff... so I will defer to the expertise of their homepage.  You can also query the help once you have arcgis pro installed ... even if you don't intend to use it... since python 3.4, scipy, matplotlib, pandas sympy and numpy are all installed... leanly.  Pandas is largely numpy with a user friendly interface, so if you are comfortable with one, then the other isn't that hard to pick up

JamesCrandall
MVP Frequent Contributor

pandas is just a python library that offers some somewhat friendly methods for grouping and working with data.  I wouldn't equate it with SQL, but it may help in many instances where your data is not in an RDBMS or you cannot perform the SQL on the database.

It is friendly with geodatabases in the sense that you can convert your Feature Classes or gdb tables to numpy arrays with arcpy.da.FeatureClassToNumPyArray and then perform your required processing/analysis.  If you require saving states of that analysis, you will have to reverse the process to create outputs in your gdb, or not -- just use the results in_memory or as temporary states of data.


It works well.

MikeCusi
Occasional Contributor II

Would an attribute join not work? You may have to clean up County_Nm in your density worksheet (before you bring it into ArcMap) to fit the expected names in instName in your Forest Ownership by County table. It looks like it is going to be a 1 to many join so maybe using the tool Make Query Table (ArcGIS Help) could also be helpful.

If you want to do it in excel you can use good old MS Query (Use Microsoft Query to retrieve external data - Excel)​. But you would still need to clean up County_Nm. You just need to bring in the two tables separately and then do the join.

As you intend to make a map later on, doing it in ArcMap is probably best.

0 Kudos
nataliemartinkus1
Occasional Contributor

I was waiting for someone to comment on my County naming convention   There is actually another column in that big table that matches up nicely with the County_Nm in the Density table.  I really like the idea of creating a SQL query but wouldn't I have to create another Density table with one column for county, one column for TPO owner type, and one column for density to use in the query ?  I am very new to SQL and databases, but I dont believe the TPO Owner column in my large spreadsheet can reference a column heading in the Density table, where the corresponding density value is taken and applied back to the large spreadsheet.

As a side note, I was able to write a very long if-elif-else code to populate the density column, but I have to do this for 3 more states so I'd love to find a quicker way.

thanks for your help.

0 Kudos