|
POST
|
I too ran into a very similar situation with a script I wrote to data mine, of sorts, all of the MXDs and LYRs in a file system to see what and how users were actually using data sources instead of asking them and relying on incomplete and usually inaccurate responses from the users. I had something like 10,000 files to look at, and I could never get the script to make it more than several hundred to a thousand before it would crash, vanish crash. I would find the MXD that was being analyzed when it crashed, and there was never any issues with them in ArcMap or if I copied subsets of related MXDs to a different folder and processed only a few hundred. Since the crashes would bring down Python, I could never find a way to adequately catch the errors. No matter how much error trapping and compartmentalization/isolation I did in the code, it would hit some magical number and poof. In the end, because I had to get something working, I used multiprocessing to pool workers so that a given subprocess could crash and not kill the rest of the script. Since the subprocesses were tracking their chunks of the list and reporting back, I could recycle the lists from the crashed subprocess and get other processes working on it. Even the multiprocessing approach got clunky because of timeout problems. There were certain MXDs, and I have no idea why, that would hang indefinitely. Some subprocesses would crash and some would hang indefinitely. It was messy in the end, but I got what I needed. Fundamentally, ArcMap seems much more tolerant of MXD structure than arcpy.mapping. It would be nice if there was a mapping.isValid method/property that could basically catch structural errors and return a Boolean rather than have the user start to list layers and have errors raised or the code crash.
... View more
08-12-2014
08:27 PM
|
1
|
0
|
3025
|
|
BLOG
|
This is the second in a four-part series on the challenges of naming new features in software applications; particularly, the consequences when naming falls short. The first part in the series looks at a case when the name of a new feature clearly and succinctly describes the behavior of that feature. The second part in the series looks at that same case when newer yet functionality alters the original behavior of that new functionality. The third part in the series looks at how the documentation has changed to addresses this altered functionality. And finally, the fourth part in the series discusses what it all means to end users and developers. When I first started beta testing ArcGIS 9.4, it didn't take long for me to see this was going to be a big release for Esri. It turns out, it was big enough to get promoted from a minor release to a major one during beta, and we all ended up with ArcGIS 10.0. The What's New in ArcGIS 10 covers lots of ground, there is just about something for everyone in there. No matter how narrow or limited your use of ArcGIS Desktop may be, one change you couldn't miss was the user interface, which had remained quite constant through the ArcGIS 8.x and 9.x days. I was interested in lots of the changes with ArcGIS 10.0, so many I shouldn't even bother starting to list them here. Although lots of changes got my attention, the changes to geoprocessing really stood out: background processing was introduced, the Python window replaced the Command Line window, ArcPy took Python support to the next level, and more. Combining all of these new features with one of my favorite existing features, the in-memory workspace, I was actually a bit excited to kick the tires and see just how great this next ride might be. Unlike ArcGIS 9.2 where I had to use the Wayback Data Center, ArcGIS 10.0 is still in production around parts of my agency, which makes it easy to take a step back in time and still generate new screenshots. For the sake of consistency and simplicity, I will just re-use the examples from the first post in this series (What's in a Name: When in_memory = In-memory) to get acquainted with the Python window in ArcGIS 10.0. Let's take a look at the results of creating a table in the in-memory workspace: Success, or not? The command appears to have completed successfully, but tmpTable doesn't appear to be in the GPInMemoryWorkspace. I am going to run that command again. Huh. The command completed successfully; but again, the tmpTable doesn't appear to be in the GPInMemoryWorkspace. In fact, now I have two tmpTables, and each one seems to have its own cryptic geodatabase in my Temp folder. Unlike in ArcGIS 9.2 where the command failed because tmpTable already existed, ArcGIS 10.0 does you a favor, if you can call it that, by just creating another one in another cryptic geodatabase. I don't know what is going on here. I better just delete these tables and clean up this mess. Wait, I can't delete the tmpTable using the same syntax that worked in ArcGIS 9.2? I guess if the tables aren't really being created in-memory, then it makes sense the Delete_management function won't find them there. The autocomplete in the Python window wants to delete "tmpTable," without the reference to "in_memory." I will give that a try: Well, at least that worked, but I don't know which tmpTable the autocomplete was talking about. Fortunately, running the command again did clean up the other tmpTable. Creating feature classes in-memory behaves the same way. Also, the corresponding tools in Toolbox for creating tables and feature classes demonstrate the same behavior. There is definitely enough consistency here to not just be a bug in a specific tool/function. Who knows, maybe in_memory means on-disk in ArcGIS 10.0. The first part in this series had an example that actually moved some data into an in-memory workspace. It can't hurt to repeat that here before coming to any conclusions. First, load those U.S. State boundaries again. Well, there we are again, a copy of features loaded into an in-memory workspace. What? GPInMemoryWorkspace? I can't say whether I expected this result or not. So, does in_memory mean in-memory or on-disk? Obviously something changed between ArcGIS 9.2 and ArcGIS 10.0, but what? The short answer, Background Processing. Not only was Background Processing introduced in ArcGIS 10.0, it was turned on by default. I can't recall the reason today, but at some point years ago I had a need to disable Background Processing. At that point, I realized disabling, or not enabling, Background Processing almost reverts in_memory back to how it behaved in ArcGIS 9.2 and 9.3/9.3.1. Interestingly enough, all of the examples above turn out very similar in ArcGIS 10.2.2. I would argue the situation in ArcGIS 10.2.2 is slightly worse than back in ArcGIS 10.0. For example, running the CreateTable_management function twice and then attempting to delete tmpTable using a fully specified in_memory path gives us: In ArcGIS 10.0, the Delete_management function failed because tmpTable didn't actually exist in-memory, which seems logical. In ArcGIS 10.2.2, the Delete_management function succeeds, but at deleting nothing! Granted, it did return a warning that tmpTable doesn't exist in-memory, but then it continues on in deleting nothing and returning a successful result. I can't speak for others, but if I call a function to delete an object and that object doesn't exist, I usually expect an error to be returned. Better yet, see what happens in ArcGIS 10.2.2 when you disable Background Processing, create a table in-memory, and try to use a fully specified in_memory path to delete it: You can ostensibly successfully delete the table three times and yet it still exists! And, this is after it has warned you it doesn't exist when it clearly does, and in-memory. It is obvious that things changed at ArcGIS 10.0 with the in-memory workspace, particularly with the use of 'in_memory.' I don't know what all changed, but there is a connection with Background processing. Furthermore, the changes have persisted throughout the ArcGIS 10.x product series. I think it is time for me to RT(?)M and see what the documentation has to say about all of these changes.
... View more
08-02-2014
11:12 AM
|
0
|
1
|
2592
|
|
BLOG
|
I am quite certain in-memory workspaces are here to stay, and I am not advocating for their demise. The question or issue for me is whether in_memory really gets you in-memory, which it doesn't in all cases. When in_memory really means on-disk, the workspace isn't any faster than scratch on disk. The next two blog posts will get into specifics.
... View more
08-01-2014
02:04 PM
|
0
|
0
|
587
|
|
BLOG
|
This is the first in a four-part series on the challenges of naming new features in software applications; particularly, the consequences when naming falls short. The first part in the series looks at a case when the name of a new feature clearly and succinctly describes the behavior of that feature. The second part in the series looks at that same case when newer yet functionality alters the original behavior of that new functionality. The third part in the series looks at how the documentation has changed to addresses this altered functionality. And finally, the fourth part in the series discusses what it all means to end users and developers. When deciding what to call a new feature in a software application, relatively short and relatively descriptive usually win out. It makes sense, really, who wants to bust out the Help or a super-decoder ring just to get an idea of what a feature might or might not do. There are risks, however, with trying to be too short or too descriptive. The former often leads to important qualifiers or fine print being left out, and putting the former and latter together typically lulls users into a false sense of understanding, i.e., assuming what the feature does instead of knowing. If the act of naming a new feature doesn't pose enough of a challenge, staying true to the name over time poses an even bigger challenge. So why bring up the challenge of naming new features and staying true to those names over time? Well, because the challenge of staying true to a name has proven too much for at least one feature in ArcGIS, and the handling of the situation has become a failure in and of itself, in my opinion. Back around the time Borat was touring the country learning about American culture, Esri released ArcGIS 9.2 (ArcGIS for Desktop Product Life Cycle Support Status). Its too bad he didn't swing by the Institute when passing through the Orange Empire, that would have been worth the ticket price alone. One of the new features introduced in ArcGIS 9.2 was the "in-memory workspace for writing temporary feature classes and tables," which could "greatly improve the performance of models, especially when writing intermediate (scratch) data" (What's New in ArcGIS 9.2). Needless to say, I was interested. Although I don't have screenshots from that time, fortunately my agency's Wayback Data Center still has ArcGIS 9.2 installed, build 1324 nonetheless! Let's role the clock back and see the in-memory workspace at its beginnings. After launching ArcMap, I was momentarily thrown by the Command Line. The Python window didn't replace the Command Line until ArcGIS 9.4, aka ArcGIS 10.0 (What's New in ArcGIS 9.4 - no link, don't think I can post a copy of the PDF either). After taking a few minutes to reacquaint myself with the Command Line, it was time to get down to business. Since this post is about the naming of features and not their performance, we won't need many examples to see whether the new in_memory workspace is really in-memory. One of the simplest examples I can think of is to create a new table in-memory: So, let's take a look at the Source tab in the Table of Contents: There it is, a new table in the GPInMemoryWorkspace. What about creating the same table again: So far, so good. We expect an error given that the table already exists. Let's take a look at the Table of Contents after I try deleting the in-memory table: Still going well. The Delete command works and the in-memory table is gone. Although I won't clutter up the post with more screenshots, I will say creating in-memory feature classes turned out the same way tables did above. Also, creating in-memory feature classes and tables using ArcToolbox yielded the same results as with the Command Line. Looking for an example that actually involves some data, I loaded a feature class containing the U.S. State boundaries into ArcMap. A simple Copy Features command using in_memory should do the trick if in-memory workspaces are working as advertised. Well, there we are, a copy of the features loaded into an in-memory workspace. The basic examples above are far from a definitive test, but they do show that starting with ArcGIS 9.2 users have the ability to store intermediate data in-memory while working in ArcMap. Overall, I would have to say the marketroids were right on this one. The in_memory workspace really is in-memory, at least within the scope of its design. When it comes to the challenge of naming a new feature, I think Esri can claim success with 'in_memory.' The name is short, descriptive, and most importantly, accurate. The question or challenge now becomes whether 'in_memory' can remain true to its original functionality as even newer features are introduced with subsequent versions of ArcGIS Desktop.
... View more
08-01-2014
11:10 AM
|
0
|
3
|
2739
|
|
POST
|
Does enabling or disabling 'Background Processing' make a difference? Since the script works fine from a Windows command line, and even when ArcScene is open, I am wondering if there is a communication issue between ArcScene and the process it is spawning, if Background Processing is enabled.
... View more
07-29-2014
06:50 AM
|
0
|
1
|
963
|
|
POST
|
If you think ArcScene is actually causing the problem, in terms of locking the file, I wonder what would happen if you went into Windows Explorer and made the files in question read-only. It could be it makes everything worse, or it might prevent ArcScene from exclusively accessing the file when other programs are trying to read it. Simple enough to try.
... View more
07-28-2014
06:24 AM
|
0
|
0
|
2383
|
|
POST
|
I have generally had more success scaling performance with multiprocessing than with multithreading, but a lot depends on the extensions/packages being used and specific structure of the code. I believe ArcPy requires CPython. Unless functions are specifically written to address the GIL, which CPyton has, multithreading code won't always perform as people might expect. But again, it depends on the specifics. And, if locking I/O is problematic and multithreading can address it, it beats multiprocessing code that doesn't run at all.
... View more
07-25-2014
09:50 AM
|
0
|
2
|
2383
|
|
BLOG
|
I believe semantics are important in all aspects of life. Whether in law, medicine, science, business, information technology, or any other field; having a common language doesn't do much good if there isn't a common understanding of the words that make up the language. Since languages evolve, maintaining a common understanding of words over time is a continual challenge. It is all too common for the familiarity of words, especially argot, to lull people into a false sense of common understanding. How many meetings have we all been to where a topic is discussed, decisions are made, and everyone walks out thinking something different in terms of what is going to happen next? Such confusion can be intentional on behalf of one or more of the parties involved in the meeting, but Hanlon's razor dictates that is the exception more than the rule. I commonly see people use the same words to mean different things when the context or participants of a discussion changes and implied qualifiers are no longer understood or known. There are times, however, when I see apathy or complacency prevent people from clarifying what is known to be misunderstood, thus willfully perpetuating confusion. It is for this reason that I have one of many worn and tattered soapboxes I must drag out of the closet from time to time. Since I rely on geospatial sciences and information technology for part of my work, I seem to find myself frequently running into semantic overloading in discussions and documentation of related software. Some of the Tilting at Globes blog posts will discuss specific examples of where failure to address known semantic overloading makes work less productive for GIS practitioners.
... View more
07-25-2014
09:06 AM
|
1
|
0
|
2281
|
|
POST
|
I found a bit of irony in the following bug description my agency recently ran into: NIM103380 : Unable to identify or select features from a view created using the Create Database View tool on ST_GEOMETRY feature class. Status: New Workaournd : Use the ArcSDE command line to create the view. It seems removing ArcSDE command line tools makes the Esri Support toolbox a bit smaller for offering workaround suggestions.
... View more
07-24-2014
08:27 AM
|
6
|
3
|
1344
|
|
POST
|
Also, how big are the data sets you are working with? Would it be possible for each child process to make an in-memory copy of the data set, or possibly a temporary copy on disk? If there is some kind of file locking occurring, even though you aren't editing the data, maybe giving each child process its own copy of the data to work on would work around the issue. Clunky, but you will still need a workaround if this is being caused by a bug.
... View more
07-24-2014
08:03 AM
|
0
|
4
|
2383
|
|
POST
|
What happens if you set maxtasksperchild=1 when instantiating the pool? I realize this isn't a good idea operationally since it will kill performance, especially on Windows, but it will be interesting to see how it affects the errors.
... View more
07-24-2014
07:43 AM
|
0
|
5
|
2383
|
|
POST
|
At least for me, it is hard to provide much feedback given the limited code snippet. Are you executing this code through the interactive Python window in ArcGIS Desktop?
... View more
07-23-2014
09:48 AM
|
0
|
7
|
2383
|