Select to view content in your preferred language

Files to Data Python Script Tool

756
3
08-22-2010 01:50 PM
RichardFairhurst
MVP Honored Contributor
I have been finding that scanned images stored as PDFs, TIFFs or other formats are beginning to make up an increasing background resource for my Department's research efforts, but they have little integration within the GIS environment itself.

In an effort to make this data more accessible directly within GIS I have writen a tool to read a chosen folder, together with all of its subdirectories, and store the file information within it into 3 fields in a table.  The first field contains the full path and file name of each file name read, which acts as a hyperlink.  The second field contains the file name only, which I use as a labeling field or hyperlink friendly name.  The third field contains the Last Modified date of the file, which helps to determine whether updates have occurred if the data needs to be periodically refreshed from the source folder.  The user can assign names for each of these fields to their taste or accept default names provided by the tool.

The tool creates a standalone table.  ESRI standalone tables are relatively limited and I believe that substantial enhancements to Editing and other functions should be provided by ESRI.   It is possible to use these tables in a relationship to other features and data, but otherwise standalone tables limit what you can do.  If you feel like me that Standalone tables should be more fully integrated into the ESRI editing environment and tool sets, please support the idea posted here:
http://ideas.arcgis.com/ideaView?id=0873000000088RZAAY

In order to get the most flexibility for using the output, I typically must add two additional double fields to the table and use them to create an XY Event layer.  As an XY Event Layer the records can be manipulated like a feature class.  For example, the records can be selected in different ways and the selections can be copied and pasted into other feature classes, EXCEL or other programs.  I also have developed a VBA tool that can actually attribute the XY fields with coordintates derived from a mouse click, but I am still looking at ways to establish a fast workflow for assigning spatial information.

I have not found the need to apply filters to the file types on the folders I have used this tool on, but the tool should not be hard to adapt to apply user specified filtering.  If there is a request for various filtering options, I will consider making that an enancement for a later version.

Special thanks to Chris Snyder, since this tool makes extensive use of the structure and error checking methods he developed for his Permanenetly Sort Records tool.  He also gave me real help in understanding how to program with Python.  This tool represents my first real effort to program a Python tool.  I hope you find this tool useful and that you are willling to share ideas on how to make the kind of information this tool creates more quickly and fully integrate into various GIS applications.

Rich
0 Kudos
3 Replies
KarlOrmer
Emerging Contributor
While waiting for a batch process to finish I had a look at your script. Please take the following as some ideas and remarks, I'm by no means an expert myself and this is supposed to be constsructive criticism 🙂


In some of your  show* functions you use variables in the function body which you do not pass to your function as arguments. This works in this case but it is fragile as you have to make sure that your variable from the higher scope which you use in the function  is defined before your function is called. Why not:
def showMessage(message):
    print message

showMessage('my testmessage')


instead of
def showMessage():
    print message

showMessage() # oops
message = 'testing'




It is not usual to use the semicolon syntax in Python. One command, one line - but in the end this is a matter of taste, of course.

Please be careful with all these blank except statements. A blank except statement swallows *all* possible python errors and makes it very hard to debug a program. I haven't used my scripts from inside of ArcMap so far, so it might be necessary to do it there. But then again, it might be a good idea to write the traceback at least to your log file.
At the moment I do the following in my scripts:
try:
    gp:Intersect_Analysis('test1.shp; test2.shp', 'out.shp')
except arcgisscripting.ExecuteError:
    print gp.GetMessages(2)

This has the advantage that it doesn't silently ignore any other errors I'm not aware of (for example the syntax error which can be found in the snippet). It might have disadvantages when starting scripts from ArcMap of which I'm not aware of at the moment but it may be worth a try.
   
In line #68 it seems as if you want to catch an IOError
except IOError:
    ....do some stuff....

From #79 on it seems to me as if the license will always be set to 'ArcView'. In an if-elif- statement not all conditions are checked. After the first condition in your if-elif- tree evaluates as `True` the body of that condition is executed and the rest of the elif-else conditions are skipped:
>>> a = True
>>> b = True
>>> if a == True:
...     print 'a is True'
... elif b == True:
...     print 'b is True'
...
a is True

So you need to invert the order of your conditions as ArcInfo is the highest license level - if I'm taken correctly 🙂

In #157: do you want to catch a NameError?

I hope you don't find these remarks, well, overbearing.
0 Kudos
RichardFairhurst
MVP Honored Contributor
Karl:

I have attempted in incorporate the alternative code styles you have suggested.  The messaging subroutines now take a parameter and employ the try: finally: syntax you suggested.  The body of the code where the messages are sent have eliminated the external message variable and now embed the message text as a parameter of the message subroutines.

I was not as clear on how to apply your ideas about the except blocks. I have not really ever read any documentation on Python that explains best practices with this syntax and was not clear on the options I have for limiting the except block behavior (most Python documentation I do find is excessively cryptic and offers almost no introductory level discussions that can be easily practiced by people new to the language).  Please review my revisions and point out some specific cases where I can make improvements to the except blocks.

I will also admit that the logfile is a new practice for me so perhaps my set up and usage of the logfile can still be improved.  I think I know what you were getting at with your specific line comments (68 and 157), but I was not sure how best to modify the code to do the error checking.  Probably if the user provided an unusable root directory for some reason, I should actually exit the application, but I have not implemented that in this version.  Is that what you meant by reacting to a NameError?

I reversed the licence levels as suggested.  I have also deleted the insert cursor at the end of the routine as a clean up (something you didn't mention, but that I noticed in other similar code, so I implemented it).

As someone new to Python I want to adopt best practices early and welcome your comments as a way to acheive that.  I hope that you will take my response as a positive indication of my intention to apply good programming practices.

At the same time, I did not feel that the comments you have made resulted in any real changes to the core code that actually carries out the objective of what the tool is designed to do.  The tools works very well for my needs, and I hope could be a starting point for others with similar needs to mine.  I do not know whether this tool has much application to things you may do, but I would hope that you would be kind enough to point out where it works, so that those who may benefit from it will take a look at it with the improvements that I have made.

Rich
0 Kudos
RichardFairhurst
MVP Honored Contributor
I have updated the tool to use arcpy and the 10.1 arcpy data access cursor.  The performance is much, much faster.  I still use this tool regularly to convert the file names and paths in a directory and its sub-directories to a geodatabase or dbase table.  It provides me with all of the hyperlinks to those files and a very efficient way of validating the completeness of feature classes with fields that connect to those files.  It is great for creating lists of data that rely on government recorded legal documents that have been scanned to pdf, tif or other image formats in particular.  I hope others find it useful.
0 Kudos