POST
|
Hi Dan - attached is some sample data and another file that explains my goal in clearer terms. In my previous posts I'm sure wasn't as succinct as I could've been. One thing to keep in mind and that I am clearly getting hung up on, is that by filtering the data first, 99% of the lines from the input file could be weeded out. In my head this seems like the best way to go about processing the data, but it's clear that what I see in my head doesn't always easily translate to the code or my understanding of how the most efficient code works. I appreciate your posts on this and I've already learned a ton, thank you!
... View more
04-05-2019
05:37 AM
|
0
|
0
|
501
|
POST
|
All that sounds ideal - just to wrap my head around it now. The ratio of input lines to output lines is very lopsided due to the filtering I need to do so I'm wondering how that will work. I'm going to post some data in the morning and hopefully that will help.
... View more
04-04-2019
07:46 PM
|
0
|
0
|
501
|
POST
|
OK, that would go a long way to me understanding this. I will attach 100 lines of the data Friday morning. Will have to condition and anonymize it but for these purposes it will be good to go. Appreciate it
... View more
04-04-2019
07:42 PM
|
0
|
0
|
1230
|
POST
|
Thanks for all your help on this. I have done some more reading and just want to make sure I understand the method here... It looks much more simple and lightweight than what I was doing previously so I'm glad to try this out. For your bullet points: 1. You are right about the baggage - each time this is run, it will only return maybe 0.25% of the data from the input file, hence the tool's necessity. I understand that the field widths have a big impact on memory usage with this method, so I'd have to constrain those really see if all the fields I output currently are crucial to my goals for this... There may be one or two that I can drop; they are integers so it won't save a lot of room but every little bit helps I suppose. In your example would the code create a massive array of my specified columns for the entire file? Or does it work one line at a time? I am confusing myself I think. 2. If I have the metaphor right, can I do the pruning before the gathering - just leave the rotten berries on the plant? ie can the if statement come before the array so lines that don't meet the threshold don't even get considered? Or is that just not how it works? Or does the array need to be completely created before I pick and choose the data to come out of it and into my featureclass? Pick every berry, throw out the bad ones, then make the jam.... Or does something else need to happen altogether? 3. Do you mean that the insertcursor business is compatible with your code, or that the insertcursor is a separate function that will also achieve the same result? I will do much more reading on this...... btw I was never all that good with Avenue back when I was in school, but that was my fault and not my instructor's, he tried his best with me and I'm still doing GIS so that's good
... View more
04-04-2019
12:02 PM
|
0
|
6
|
1230
|
POST
|
Also the if statement I'm using is crucial to finding the needles in the haystack and then only processing and outputting the needles... Can it be incorporated using your method?
... View more
04-04-2019
05:23 AM
|
0
|
8
|
1230
|
POST
|
Hi Dan; your suggestions are different from what I was expecting so it might take me a bit longer to understand, but I'm willing to try them out, thanks. Glad to have some different routes to try. For more reference information: I am writing this to replace a standalone VB parser that wrote to a new text file which was then manually ingested to Arc using the Add XY Data function in Catalog. I have a bit of time so I thought I'd try to migrate that process into the Arc environment and create the output file all in one go. The VB parser is legacy from many years ago and chunked the data. I'm using 10.3 but am making the leap to 10.6 or 10.7 soon. Input file is tab-delimited. My input file does not mix data types within columns. The format of the output file is important because other processes after this depend on the field types - 5 fields are required by further process while the other 5 fields are needed just for reference. There is no header on the file but I know the schema. Some fields use -999 as a null while others have blank space - but all the fields I need to perform further tasks on will be 100% populated with real information (this is a condition of the file when it is created and is defined in the schema). As for the memory issue - I thought the reason for processing the file line-by-line was to avoid the memory issues? That's what all the guidance and help I could find online said, that reading the file in one go was a bad move and that the easy solution was to process a line, dump it, process the next line, etc.... I could be misunderstanding your process and the guidance I read though... Or perhaps line-by-line isn't available in the method you're suggesting? Thanks, hope this additional information helps.
... View more
04-04-2019
04:53 AM
|
0
|
0
|
1230
|
POST
|
Ah, yes I guess the casual-ness doesn't lend itself to getting the full picture across, sorry about that. I didn't want to further bog down my question but I see that the missing information is pertinent. My input file is tab-delimited (which I have accounted for) and has ~50 fields of which I am only interested in 10. I've created the Feature Class to have those 10 fields. 3 of the fields need to be transformed, converted from feet to meters and then rounded to have no decimal places (I only included one of those fields in the example because the methodology would be identical). I could do this in 2 steps by creating an imperial field in the output file, populating the metric field via field calculator, then deleting the imperial field but was hoping to do it in one step to save that file maintenance at the end. I have the index positions of all the input file fields and I know the order in which they should be inserted into the Feature Class. For example, for the 4 fields in my original post the index positions are [5], [6], [9], [4]. (The Lat and Long fields [5] [6] are required in the output file and will also be used with the SHAPE@XY token to create the point geometry.) The if statement filters only records whose code has been specified by the user and tree height is >= 100 ft. This bottom part of this post from Stack Exchange looks promising, but I haven't been able to test it out yet and I clearly am not entirely sure what I'm doing just yet.
... View more
04-03-2019
07:35 PM
|
0
|
0
|
1230
|
POST
|
I have a very large text file (~5 GB, ~30 million lines) that I need to parse and then output some of the data to a new point feature class. I cannot figure out how to proceed with the da.InsertCursor. I've created the feature class so the fields are in the required order. There is an if statement that parses out the required lines of the file. The units in the text file are Feet, however I need rounded Metres in the output and the output field for that information must be Long. The round command returns float but my final value in the Tree_Height output must be long - can the float value be mapped into the long field? Converting from Feet to Metres is simple multiplication, but I believe my field types might be messed up then? The index positions of the Lat and Long fields in the source file are 5 and 6. The index position of the Description field in the source file is 1. The index position of the Tree_Height (in Feet) in the source file is 9. The index position of the Code in the source file is 2 - this is the link between the user-inputted codes and the codes in the if statement. The if statement works, but after that.... Can someone help me set up the da.InsertCursor so that these operations can (a) get done and (b) get done efficiently? I've been fumbling with lists of fields and tuples of things and am not making any progress. Have looked at help files and Penn State's online courses but still no joy..... import arcpy, os
treesfile = arcpy.GetParameterAsText(0)
codes = arcpy.GetParameterAsText(1).split(",") # user types in comma-delimited wildlife codes for the AOI
arcpy.env.workspace = arcpy.GetParameterAsText(2)
sr = arcpy.SpatialReference(4326)
arcpy.env.overwriteOutput = True
Filtered_Trees = arcpy.CreateFeatureclass_management(arcpy.env.workspace, "Filtered_Trees", "POINT", "", "DISABLED", "DISABLED", sr)
arcpy.AddField_management(Filtered_Trees, "Lat", "DOUBLE", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
arcpy.AddField_management(Filtered_Trees, "Long", "DOUBLE", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
arcpy.AddField_management(Filtered_Trees, "Description", "TEXT", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
arcpy.AddField_management(Filtered_Trees, "Tree_Height", "LONG", "", "", "", "", "NULLABLE", "NON_REQUIRED", "") # height value must be rounded and in Metres.
# there are many more fields but you get the idea
with open(treesfile, 'r') as file:
for line in file:
values = line.split("\t")
if values[2] in codes and float(values[9]) >= 100:
# now I am stuck...
... View more
04-03-2019
01:27 PM
|
0
|
16
|
2369
|
POST
|
This works even better for my purposes than the first message I marked as correct! Thank you, you've unlocked the key to my problem and I'm happily on my way now.... Until the next inexperienced coder hiccup! Cheers.
... View more
03-21-2019
10:46 AM
|
0
|
0
|
358
|
POST
|
If I'm reading that right - List1 is typed-in, hardcoded, predefined etc; List3 creates a true list of layer objects based whether or not the layer object name of everything in the dataframe (pseudo-List2) appears in List1; List3 therefore becomes the resulting list of true layer objects that can be recognized by getSelectionSet and any other method that recognizes layers... If I'm interpreting that correctly I think it would be exactly right..... I'll have to wait to try it out in the morning but this looks promising... Thank you again for walking me through this! I'll let you know how it turns out
... View more
03-20-2019
04:31 PM
|
0
|
0
|
358
|
POST
|
Thanks, I will look at that. I'm using a mix of hardcoded lists and ListLayers so I'm guessing that's where I'm running into trouble. I just posted a long-winded reply in that other thread in case you'd like to know more about my warped little script
... View more
03-20-2019
12:41 PM
|
0
|
0
|
883
|
POST
|
I am a bit lost to be honest! I'd love to explain but at the moment I don't fully understand what my script is doing and how I can go from there. Perhaps this problem stems from me typing in the first list and hard-coding that into the script, along with each of the subsets of that list. The only real arcpy listlayers list is the one I get from my active data frame. I couldn't figure out any other way to do it - especially when I have to make the subset lists to weed out features. To recap, I have a list of all potential layer names which I have hardcoded like List1 = ["Quarries", "Ponds", "Gullies", "Runways"]. Not all of these layers are present in each map, so I want to run a function with only the layers from (List1) that are actually in my map (List3 - Ponds & Runways). I have other layers in my dataframe that I cannot fun the function on because the result is not needed and/or I cannot predict what will happen if other users add other datatypes to the dataframe. Every layer in the dataframe makes up List2 (Features, Annotation, Grids, Rasters, user-specific layers etc). So using the snippet you shared with me I was able to weed out the unnecessary layers, but if I try to print List3 as a check nothing happens, and if I try to get the selection set after my Select By Location loop runs (referenced in another post you have graciously replied to) nothing happens - the script completes properly with no errors, but I do not get the result that I am looking for, 'getSelectionSet > 0' and FIDSet != '' queries do not return anything. Because they do not return anything I can't pass the names of the layers which contain a selection to a bunch of if / elif statements to output specific instructions if that layer meets my conditions... I hope to get a handle on things with our on-site ESRI help but he is stretched thin these days. I do appreciate your help though, it's made things better And I'm learning.
... View more
03-20-2019
12:36 PM
|
0
|
3
|
358
|
POST
|
This works for me, thanks so much. If anyone else reading this requires a string it requires the following change: On line 5 change [lyr.name for .... to [str(lyr.name) for ....
... View more
03-20-2019
10:27 AM
|
1
|
5
|
1601
|
POST
|
The list where I got that error was strings of layer names, you are correct. Sorry for not specifying that more clearly in the OP. What would I need to change in this scenario to get the expected result, or is it too far gone now... Later on in my script I need to run that again - this time it is running on layer objects returned from listing the layers in my dataframe and even though there are selected features within the layers, no result is given. There is no error, but there is also no result.
... View more
03-20-2019
10:03 AM
|
0
|
1
|
883
|
Title | Kudos | Posted |
---|---|---|
1 | 04-05-2019 09:16 AM | |
1 | 03-20-2019 10:27 AM | |
1 | 03-13-2020 08:18 AM |
Online Status |
Offline
|
Date Last Visited |
05-21-2022
03:22 AM
|