|
IDEA
|
I will upvote this idea, but I am definitely not holding my breath. My last job was working for a large federal government agency with a sizeable Enterprise Agreement. This issue was raised to Esri multiple times over 10+ years (could even be 20), and Esri expressed no interest in supporting deauthorization of ArcGIS Server licenses.
... View more
04-24-2026
12:04 PM
|
0
|
0
|
715
|
|
POST
|
OK, I think I finally understand (or mostly) what is going on with arcpy.da.Walk and multithreading. As you know, I created some synthetic datasets to represent a range of GDB structure, and have been testing those datasets on local SSD, LAN SMB (~1.25 ms), and WAN SMB (~20 ms). arcpy.da.Walk is an os.walk-shaped generator — one __next__() per directory — and it holds some in-process lock during each __next__() call, releasing between yields. A probe driving two Walk generators from two threads (thread A, thread B) showed the two threads' iterations alternate perfectly in lockstep (B, A, B, A, ...) on a gdb with feature datasets, with ~80% wall-clock overlap. So threads do get scheduled and do share the work, but they share it by taking turns at the lock, not by running truly in parallel. That puts a ceiling on how much threading can help. Total lock-holding time is bounded below by total work, so two threads can't beat one by much. The practical consequence is that how much threading wins depends on your gdb's directory structure. A flat gdb is one directory with one big yield, so two threads can't interleave at all. This means the second thread just waits until the first is done. A gdb with many feature datasets has many yield points and the two threads can rotate through the lock productively, which is probably where some of your time savings is coming from in your tests. In my testing, ProcessPoolExecutor gave a more predictable ~1.6-1.9× speedup at WAN latency regardless of gdb shape, because separate processes have separate arcpy state and don't share the lock at all. The tradeoff is the ~3s cost of spinning up each worker (fresh arcpy import, license check), which is fine if each gdb walk takes more than a few seconds but eats the benefit on fast walks. If you're walking many gdbs over a WAN, the biggest time savings is probably one process per gdb rather than threading within one gdb — that axis scales cleanly and doesn't depend on what any individual gdb looks like.
... View more
04-21-2026
07:35 AM
|
1
|
1
|
1388
|
|
POST
|
I have used some of the publicly available reverse-engineered FGDB specifications to do this exact same kind of thing. I even created a locking mechanism that emulates what some SDK calls do to ensure I am playing nice within the FGDB when reading it files, but that is a different discussion thread. Looking at your network dataset times, I suspect 2/3 to 3/4 would actually benefit from multi-processing instead of multi-threading. The bottleneck with Walk isn't the FGDB API but something in either Walk itself or internal code that Walk depends upon. It is perfectly fine to have multiple, even many, processes reading the same FGDB at the same time without issue. Even though initializing an arcpy license does take a few seconds, if enumerating a data type in an network share FGDB takes tens of seconds, that 3-second hit from initializing a library isn't that big.
... View more
04-20-2026
03:53 PM
|
1
|
1
|
1109
|
|
POST
|
Feel free to recycle the gist however you please. There is something odd, odd in the sense that I can't explain it yet, with how da.Walk is working with multi-threading. I spent several hours yesterday devising different tests, and I may be finally getting close to isolating why the multi-threaded da.Walk results aren't as robust as one would expect.
... View more
04-20-2026
08:33 AM
|
1
|
5
|
1136
|
|
POST
|
I know there is more to your repo than just getting lists of feature classes and tables from file geodatabases, but when it comes to getting lists of feature classes and tables from file geodatabases I am not seeing any benefit most of the time from using multiprocessing vs sequential calling regardless of network latency. The code structure I have been comparing is (da.Walk initialization/workaround handled upstream from code snippet): # ---------------------------------------------------------------------------
# Shared worker
# ---------------------------------------------------------------------------
def walk(ds: str, dtype: str | None = None) -> list[Path]:
paths: list[Path] = []
for root, dirnames, filenames in Walk(ds, datatype=dtype):
for itm in entries:
paths.append(Path(root) / itm)
return paths
# ---------------------------------------------------------------------------
# The parallel and sequential dispatchers. Keep these as structurally
# similar as possible.
# ---------------------------------------------------------------------------
def extract_types_threaded(ds: str, dtypes: list[str]) -> dict[str, list[Path]]:
"""One thread per datatype. Matches OP post."""
data: dict[str, list[Path]] = {}
with ThreadPoolExecutor(max_workers=len(dtypes)) as executor:
futures = {executor.submit(walk, ds, dtype): dtype for dtype in dtypes}
for future in as_completed(futures):
data[futures[future]] = future.result()
return data
def extract_types_sequential(ds: str, dtypes: list[str]) -> dict[str, list[Path]]:
"""One call per datatype, main thread only."""
return {dtype: walk(ds, dtype) for dtype in dtypes} If you are so inclined and have the time, I posted ArcPy: Generate FGDB Collection as a GitHub gist to create a collection of differently structured file geodatabases for synthetic testing. I would be interested in what the timings look like for you getting feature class and table lists from these FGDBs on your network file share. And what is the latency to your network file share from the client running the tests. What a default run produces Running generate_fgdb_collection.py with no arguments writes the collection to ./collection/ and builds every profile in the catalog. Each profile produces one .gdb directory plus a sibling .manifest.json for the correctness checker. Per-profile breakdown Profile FCs Tables FDs RCs Total Purpose empty 0 0 0 0 0 Zero-of-everything edge case tiny 4 2 1 1 8 Sanity, one of each type flat_small 20 10 0 0 30 Isolates root enumeration flat_medium 100 50 0 5 155 Forum-thread baseline nested_medium 100 50 5 5 160 Tests FD recursion wide_datasets 100 0 20 0 120 Stresses FD count deep_only 40 0 1 0 41 Single FD holds all rc_heavy 20 20 0 50 90 Stresses RC enumeration xl 500 200 10 30 740 Stress tier, largest workload Total 884 332 37 91 1,344 FCs counts all feature classes including those inside feature datasets, so nested profiles roll up. Total is the sum of FCs, tables, FDs, and RCs for that profile; it excludes domains (which Walk does not enumerate). Every FC and table gets five baseline fields (NAME, VALUE, CATEGORY, CREATED, FOREIGN_KEY) plus three deterministic-random extras. All objects are empty; Walk enumerates schema, not rows.
... View more
04-19-2026
02:05 PM
|
1
|
7
|
1161
|
|
POST
|
Is it always the same machine becoming unstable? Since this is a multi-machine site, I would put one of the machines into maintenance mode for a day or two or three and see if the site remains stable with just one machine. If the same machine becomes unstable now, drop that one into maintenance mode first. Then swap which one is in maintenance mode. If the site is stable when only running on one machine but highly unstable when only running on the other, you know that something likely went wrong with the upgrade process on that one machine. At that point, chasing gremlins is seldom worth it, just drop the one machine from the site, uninstalled and reinstall software, and then re-join it back to site.
... View more
04-19-2026
10:08 AM
|
2
|
0
|
902
|
|
POST
|
I will have to check out the library on GH. I don't fully understand what the screenshot is showing? Are you just running the same code block in a loop using the same data set(s) and arguments? If so, the file system caching done at the OS level and the caching happening in ArcGIS code is likely influencing the results. For code like this, the cold call performance is what is important, right? What types of latency are you dealing with to your network file shares? I know your screenshot is for a test database or example database, but are these times slow enough to warrant the time to optimize an approach in the first place?
... View more
04-16-2026
08:22 AM
|
1
|
9
|
1220
|
|
POST
|
Given Esri makes no statements about which modules, DLLs, etc... are thread safe, the safe bet is to assume ArcPy is not thread safe. The initialization quirk you found using arcpy.da.Walk with a ThreadPoolExecutor is both an example and indicator that arcpy.da.Walk is not completely thread safe. Although you did find a workaround to initialize arcpy.da.Walk, I am not sure the benefits of running multi-threading (if any) outweigh any risks of multi-threading non thread-safe code. Have you tried doing sequential calls and compared performance? The internal caching that the libraries behind arcpy.da.Walk use should result in fairly performant sequential calls.
... View more
04-15-2026
11:27 AM
|
1
|
11
|
1248
|
|
BLOG
|
To @BriannaEttley and the Esri staff that helped pull this information together and share it, thanks. Even though many of us interactive with other MVPs on a fairly regular basis, it is challenging to keep up on the goings on of all MVP members. This is a nice way for all of us to see what others are up to and recognize some of the more meaningful contributions from MVPs. Cheers.
... View more
04-14-2026
10:12 AM
|
3
|
0
|
691
|
|
POST
|
When you logged in as the domain service account the other day and successfully ran the model, did you get any dialog pop-ups? Specifically, what there any dialog relating to accepting a certificate?
... View more
04-14-2026
09:38 AM
|
0
|
1
|
575
|
|
POST
|
Thanks for following up and sharing the root cause and solution. This is the kind of small thing that can really trip people up, so getting it out in the web more broadly can only help.
... View more
04-13-2026
06:26 AM
|
0
|
0
|
450
|
|
POST
|
If you haven't already, it might be worth reading through Solved: how to generate correct token for a federated Ente... - Esri Community. Esri's various APIs handle authentication workflows correctly, and handling them with hand-rolled code commonly trips people up. I can't say I fully understand your authentication workflow. It seems like you are using what is commonly called the two-step exchange, i.e., authentication first happens on Portal to generate a Portal token, and then that Portal token is used to create an ArcGIS Server token using the generateToken endpoint with the token and serverUrl parameters. Portal validates your Portal token, then creates a new ArcGIS Server token encrypted with the federated server's shared key. The resulting token is meant to be used with that specific federated ArcGIS Server and can be validated locally by that server. Portal cannot validate it because it was encrypted with the server's shared key, not the Portal's. If you need to validate a token against the Portal's portals/self endpoint, use the Portal token directly — before the exchange step.
... View more
04-13-2026
06:21 AM
|
0
|
0
|
527
|
|
POST
|
This seems like a defect, can you provide some sample data in a file geodatabase that replicates this behavior?
... View more
04-09-2026
05:15 PM
|
1
|
0
|
493
|
|
POST
|
So have you ever run the Model successfully with when logged into the machine as the domain service account user?
... View more
04-07-2026
07:31 AM
|
0
|
3
|
1444
|
|
POST
|
The questions from TimoT are good to think about, I would additionally ask if you are trying to run the scheduled task using your credentials or other credentials. If other, a local machine system account or another user?
... View more
04-06-2026
09:21 AM
|
0
|
5
|
1483
|
| Title | Kudos | Posted |
|---|---|---|
| 2 | 3 weeks ago | |
| 1 | a month ago | |
| 1 | 4 weeks ago | |
| 3 | 4 weeks ago | |
| 1 | 05-22-2026 05:27 AM |
| Online Status |
Offline
|
| Date Last Visited |
a week ago
|