List features in multiple workspaces then convert to point, BUT ignore some paths?

SWCASWCA · ‎07-31-2013

I have this code to loop through a workspace and find all shps and feature classes that are named a certain thing. I'd like to convert each poly to a point and then append it to a feature class, BUT I want to skip over any feature that has the word "ARCHIVE" in it's path. Any tips on how to do this would be great!

BTW, the code fails so far. My final loop is merely taking the first letter of my output and doing the rest, instead of taking the actual feature class. I.e. it is telling me that "D_pt" does not exist. I know the path works, that is if I print it lists what I need it to list (all feature classes and shps that start with "APE" or "SAB") but then I can't do anything with it.

import arcpy, os  #Set the workspace workspace = r"D:/Work"  #create list of files from directories and subdirectories for root, dirs, files in arcpy.da.Walk(workspace, datatype="FeatureClass", type="Polygon"):     for name in files:         if name.startswith("APE") or name.startswith("SAB"):             path = os.path.abspath(os.path.join(root, name))             for x in path:                 outFC = x + "_pt"                 # convert to points                 arcpy.FeatureToPoint_management(path, outFC, "INSIDE")

RhettZufelt · ‎08-01-2013

So, what are your polygon datasets in "C:\TEMP\PDX_SAB" ?

You are not doing a selection or anything, so it should make a point for each polygon feature from each input dataset. If you have more than one polygon in the FC, you will get more than one point.

Is this what is going on, or are you getting the same point entered multiple times?

R_

SWCASWCA · ‎08-01-2013

Yep, you nailed it - multiple records of the same shape (I think someone used it for data driven pages). Fixed with a delete_identical.

I would love to know if there are better/faster ways of doing what I'm doing - either cleaner code or better functions/processes. If this is iterating through a very large workspace with hundreds of folders and subdirectories I can see it really bogging a machine down. TThe in_memory space for example - I never would have thought of it (had never heard of it) and it's really a nice lightweight approach.

RhettZufelt · ‎08-01-2013

Well, your script is stand alone, and not controlling a map document, so don't think the data driven pages has anything to do with it. you are working with the FC's directly.

From what you have told me, perhaps each polygon feature class has duplicate polygons? In the attribute table of the polygon FC, in ArcMap, how many records do you see? each one would be made to a point. If you are getting identical points, probably identical polygons.

This is real common if the data at one time came from autoCAD.

As far as a different approach, not sure if it would be faster, but here is one method.

Create a list of all the input polygon FC's, iterate them and add your fields to them directly, calculate the fields to thier respective source.

Then, since you already have the list, use that as the input to

arcpy.Merge_management(fcList, outFC) # not sure if you would need to deal with field mappings. I think it will keep all fields that have identical schema.

Then, run the FeatureToPoint on the outFC and do it all in one swoop.

Like I said, not sure if it would speed it up or not, but I have noticed that the append tool is very slow, and involking it each loop could be painfully slow.

R_

SWCASWCA · ‎08-01-2013

Sorry to be confusing in my earlier post. I meant that the original poly had multiple records in it that someone had used for data driven pages (my guess) and that was why I was getting multiple identical points. The delete_identical worked great for removing those.

I would likely have to deal with different schemas, so it may make sense to add and delete fields one by one before appending, though it sounds like the time hang up may be in the append itself?

I will also at some point have to address the issue that the original polys are in different projections as well but . . . baby steps. 🙂

RhettZufelt · ‎08-01-2013

OIC. I thought you were running the deleteIdenticals on the final appendFC, but sounds like you used it to "fix" the polygons.

Don't think the different schemas would matter as long as the fields you are concerned about are identical. I.e., if you add your three fields to each polygon FC first.

Now that I think about it (even more), this is what I would try:

merge will add them all together and add all fields from all input FC's. Will combine them if they are identical. So, if they have the three identical fields that you are interested in, the rows will be populated with the attributes.

So,


appendFC = r"C:\TEMP\PDX_SAB\PDX_SAB.gdb\PDX_SAB_Centerpoint"

outFC = "in_memory\\centerPoints"

tmpFC= "in_memory\\tmpFC"

fieldList = ["ProjectNum",  "OriginalPath",  "ServiceArea"]
 
                           ##Merge tool will honor the coordinate system of the first FC in the list to be merged, so I'd make my list something like:
fcList = ["baseFC"]   # where BaseFC is a FC in your workspace that is in the correct coordinate system, this will set the output to the SR you want

layerList = []

then, after you do your walk,
for all file in filenames:
   fcList.append("file")

for all fcs in fcList:                     ## these two look like duplication, but this is how I add to a list with a value already.  For file in filenames is a list I can't control this way
   layername = str(fcs) + "feature_layer"
   layerList.append(layername)    # append the new in memory layer names to a list so we can utilize it after iteration.

   arcpy.MakeFeatureLayer_management(fcs,layername)   # this will put them as in memory layers and are fast

   arcpy.AddField_management(layername, "ProjectNum", "TEXT", "", "", 20, "", "NULLABLE", "REQUIRED")
   arcpy.AddField_management(layername, "OriginalPath", "TEXT", "", "", 255, "", "NULLABLE", "REQUIRED")
   arcpy.AddField_management(layername, "ServiceArea", "TEXT", "", "", 75, "", "NULLABLE", "REQUIRED")

   arcpy.CalculateField_management(layername, "ProjectNum", "exp", "PYTHON_9.3")
   arcpy.CalculateField_management(layername, "OriginalPath", "exp2", "PYTHON_9.3")
   arcpy.CalculateField_management(layername, "ServiceArea", "exp3", "PYTHON_9.3")


Now, after iterating through all FC, and making the list of feature layers, combine them all so we can do the rest in one swoop.
   
arcpy.Merge_management(layerList, tmpFC)   ### merges all the polygon in memory fc's with new calculated fields into an in memory fc

fields = arcpy.ListFields(tmpFC)

for field in fields:
   if field not in fieldList:
          arcpy.DeleteField_management (tmpFC, field)  ## will drop all the "extra" fields if not in the fieldsList.

arcpy.FeatureToPoint_management(tmpFC, appendFC, "INSIDE")   ## this will make point FC in same SR as first input with your three fields appended.

unless you run out of memory or something, I think this approach would be much faster.
Of course, this is just the basic outline, you would need to incorporate the expression code and such in there, so this code isn't "run ready"

Also, as far as converting the SR of the polygons, if you have the need to convert them for some other reason, then probably worth doing that first. however, both append and merge will honor coordinate systems (if defined) and will on the fly re-project them to the ouput datasets SR (to the SR of first input for Merge).

R_

also, could easily skip the two list builders if you make sure that the first one being read to your list is in the proper SR.

SWCASWCA · ‎08-05-2013

Didn't know about the reproject-on-the-fly with the append and merge commands - good info.

Am trying this but am a bit confused about something - is the for fcs in fcList nested within for file in filenames? Or are they distinct for loops? If they are distinct, how can I rewrite the first two expressions used in the CalculateField command? Right now I have the for loops separated/distinct but then I get an error that 'filename' is not defined when setting up exp and exp2, obviously because it no longer exists outside of it's for loop. Do I use the items in fcList somehow instead?

Thanks

RhettZufelt · ‎08-12-2013

Melissa,

The for file in filenames just takes the output in filenames (from da.walk) and appends them into a new list.

The for fcs in fcList then iterates through this new list, so they are completely independant of one another. I did it this way so I could control the order of the list and ensure that baseFC is ALWAYS the first in the list so that merge pulls the spatial reference from that layer. If there were an easy way to just add that FC to the beginning of the original filenames list, would just use the one loop instead.

Instead of using filename in your exp statement(s), use fcs.

exp2 = str(os.path.join(dirpath, fcs))

R_

RhettZufelt · ‎08-13-2013

Melissa,

Was in a hurry when I posted that, and wasn't sure if the filenames list from da.walk is a "normal" python list, so wasn't sure if insert would work correctly. But, after testing, I see that it does.

Could have done it this way:

baseFC = r"c:\pathto\dataset  ##    set to FC that has the desired output SR


                             ## then, after you do your da.walk,
filenames.insert(0,baseFC)    ##  this would ensure the baseFC is the first in the list when passing it to the merge function

Then, you could use the original code (for file in filenames: ) rather than having to use the (for fcs in fcList)

Either way works, but this method is more "clean",

R_

SWCASWCA · ‎08-27-2013

Just getting back to this after being out of town.

I am now getting a runtime error on the DeleteField_management(tmpFC, field) statement and I'm not sure why. It doesn't give me any clue other than 'error in executing tool'.

I think I am confused by when 'layername' is used and when 'layerList' is used. The layername are appended to layerList early in the script, but then layername is used for adding and calculating fields. Yet, layerList is then used for deleting fields?

Again, thanks for your patience.