<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: A Fun Threading Situation With da.Walk in Python Questions</title>
    <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1696875#M75193</link>
    <description>&lt;P&gt;When testing on our fileserver, this same database initialization went from 30s to 15s (we have thousands of file databases with ~100 feature classes and ~50 tables each). There is filesystem caching, but the actual Walk call itself is slow, and since I have to call it multiple times to get all the features initialized into the proper types, I had to pay that cost sequentially, now those 4 walks (FeatureClass, Table, Relationships, FeatureDatasets) are run concurrently.&lt;/P&gt;&lt;P&gt;I'm only using walk instead of the List* functions because Walk doesn't rely on the global arcpy.env state which can be finicky when you start changing workspaces rapidly.&lt;/P&gt;&lt;P&gt;The code I shared above has been implemented in the initializer of that Dataset object, so the sequential runs are just showing that it's not having issues missing things or losing data. The cold start is expected of course.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Addendum: This code is also in a Python toolbox and is frequently used to scan a database to populate tool parameters or validate parameters, and since tools are re-initialized every time you modify a parameter, the cold call is actually not the limiting factor. Getting average call time down saves a lot. I will still usually create a global cache for a tool, or use the&amp;nbsp;@cached_property or @lru_cache decorators to prevent repeated loads on more expensive operations though.&lt;/P&gt;</description>
    <pubDate>Fri, 17 Apr 2026 05:03:16 GMT</pubDate>
    <dc:creator>HaydenWelch</dc:creator>
    <dc:date>2026-04-17T05:03:16Z</dc:date>
    <item>
      <title>A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1696136#M75189</link>
      <description>&lt;P&gt;So I have some library code that handles loading in file databases and indexing the contained data. There are then grouped out into dictionaries so I iteratively use da.Walk to extract each datatype from the dataset.&lt;/P&gt;&lt;P&gt;This can of course be pretty slow, especially when loading in a database that's on a network fileserver. No problem though, this is something that can be solved pretty simply by creating some threads using concurrent.futures.ThreadPoolExecutor:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from concurrent.futures import ThreadPoolExecutor, as_completed
from arcpy.da import Walk
from pathlib import Path

def walk(ds: str, dtype: str | None = None):
    """walk a dataset filtering on the supplied datatype"""
    paths: list[Path] = []
    for root, _, items in Walk(ds, datatype=dtype):
        for itm in items:
            paths.append(Path(root)/itm)
    return paths

def extract_types(ds: str, dtypes: list[str]) -&amp;gt; dict[str, list[Path]]:
    """Extract paths from a dataset grouped by type"""
    data: dict[str, list[Path]] = {}
    with ThreadPoolExecutor(max_workers=len(dtypes)) as executor:
        futures = {executor.submit(walk, ds, dtype): dtype for dtype in dtypes}
        for future in as_completed(futures):
            data[futures[future]] = future.result()
    return data&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Seems simple enough, spool up one thread per Walk call and await the results so they can be done concurrently, lets run it:&lt;/P&gt;&lt;LI-CODE lang="c"&gt;&amp;gt;&amp;gt;&amp;gt; extract_types("My_GDB", ['FeatureClass', 'Table'])
{'FeatureClass': [], 'Table': []}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hmmm. There's no output, but there is definitely both Tables and Feature Classes in that gdb... Let's try a syncronous extract method:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def extract_types_sync(ds: str, dtypes: list[str]) -&amp;gt; dict[str, list[Path]]:
    """Extract paths from a dataset grouped by type"""
    return {
        dtype: walk(ds, dtype)
        for dtype in dtypes
    }&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And run that:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;&amp;gt;&amp;gt;&amp;gt; extract_types_sync("My_GDB", ['FeatureClass', 'Table'])
{'FeatureClass': [Path("My_GDB/FC1"), Path("My_GDB/FC2")], 
'Table': [Path("My_GDB/Table1"), Path("My_GDB/Table2")]}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Okay, so there IS data in the database, and Walk is able to find it. Lets try the concurrent version again:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;&amp;gt;&amp;gt;&amp;gt; extract_types("My_GDB", ['FeatureClass', 'Table'])
{'FeatureClass': [Path("My_GDB/FC1"), Path("My_GDB/FC2")], 
'Table': [Path("My_GDB/Table1"), Path("My_GDB/Table2")]}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So now the concurrent version is able to find the data,&amp;nbsp; but only **after** running a Walk syncronously? This little bug persists through interpreter sessions it seems. So let's see if warming up the Walk function can fix it:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def extract_types(ds: str, dtypes: list[str]) -&amp;gt; dict[str, list[Path]]:
    """Extract paths from a dataset grouped by type"""
    for _ in Walk(ds): break
    data: dict[str, list[Path]] = {}
    with ThreadPoolExecutor(max_workers=len(dtypes)) as executor:
        futures = {executor.submit(walk, ds, dtype): dtype for dtype in dtypes}
        for future in as_completed(futures):
            data[futures[future]] = future.result()
    return data&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And run one more time:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;&amp;gt;&amp;gt;&amp;gt; extract_types("My_GDB", ['FeatureClass', 'Table'])
{'FeatureClass': [Path("My_GDB/FC1"), Path("My_GDB/FC2")], 
'Table': [Path("My_GDB/Table1"), Path("My_GDB/Table2")]}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now it works! This is really odd though. I'm guessing that da.Walk relies on some global state that isn't initialized in a sub thread and must be initialized in the main thread. This is definitely odd behavior though, and I figured that I'd share it here in case anyone else happens to run into it. I am also curious how this pattern will work when 3.14 is adopted and we have access to the InterpreterPoolExecutor. Will the arcpy global state need to be shared for functions as simple as da.Walk?&lt;/P&gt;</description>
      <pubDate>Tue, 14 Apr 2026 15:12:39 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1696136#M75189</guid>
      <dc:creator>HaydenWelch</dc:creator>
      <dc:date>2026-04-14T15:12:39Z</dc:date>
    </item>
    <item>
      <title>Re: A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1696489#M75190</link>
      <description>&lt;P&gt;Given Esri makes no statements about which modules, DLLs, etc... are thread safe, the safe bet is to assume ArcPy is not thread safe.&amp;nbsp; The initialization quirk you found using arcpy.da.Walk with a ThreadPoolExecutor is both an example and indicator that arcpy.da.Walk is not completely thread safe.&amp;nbsp; Although you did find a workaround to initialize arcpy.da.Walk, I am not sure the benefits of running multi-threading (if any) outweigh any risks of multi-threading non thread-safe code.&lt;/P&gt;&lt;P&gt;Have you tried doing sequential calls and compared performance?&amp;nbsp; The internal caching that the libraries behind arcpy.da.Walk use should result in fairly performant sequential calls.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Apr 2026 18:27:47 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1696489#M75190</guid>
      <dc:creator>JoshuaBixby</dc:creator>
      <dc:date>2026-04-15T18:27:47Z</dc:date>
    </item>
    <item>
      <title>Re: A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1696515#M75191</link>
      <description>&lt;P&gt;The limiting factor is the speed of the NAS. The parallel walks end up being okay since they're only reading. Technically there could be issues if the database being scanned is modified while the walks are executing, but that's a price I'm willing to pay since at the end of this all I have is a bunch of Path objects.&lt;/P&gt;&lt;P&gt;The previous sync version would walk a database in ~30 seconds, but this one is consistently between 12 and 15 seconds. With local indexing (not limited by network speed) being ~1-2 seconds which is the same as the sync version. The first call is slowest since we have to hydrate the Walk function, but subsequent calls are about 10% faster at least. I got around this by hydrating Walk when the module is imported so I don't have to pay that initialization cost every time I call the function.&lt;/P&gt;&lt;P&gt;I'm definitely going to keep an eye on it for the next few weeks, but for now It's been running smoothly.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Sidenote&lt;/STRONG&gt;: something that &lt;EM&gt;IS&lt;/EM&gt; threadsafe (as far as I can tell) is&lt;STRONG&gt; management.Compress&lt;/STRONG&gt; and &lt;STRONG&gt;management.Compact&lt;/STRONG&gt;, which is a massive speedup when archiving old databases depending on how many threads you want to spool up.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;EDIT: Heres a screenshot of a Jupyter session where I used the concurrent walks to construct a Dataset object:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="HaydenWelch_0-1776281449865.png" style="width: 471px;"&gt;&lt;img src="https://community.esri.com/t5/image/serverpage/image-id/151164i161029425976F443/image-dimensions/471x513?v=v2" width="471" height="513" role="button" title="HaydenWelch_0-1776281449865.png" alt="HaydenWelch_0-1776281449865.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;You can see that the first call is a bit slower than the subsequent calls, but it finds the same data every time. Anything that uses these objects knows what to expect, so if something is missing it'll throw an error telling me.&lt;/P&gt;&lt;P&gt;If you want to take a look or try it out, the library is open source and the code is available &lt;A href="https://github.com/hwelch-fle/arcpie/blob/master/src/arcpie/database.py" target="_self"&gt;here&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Apr 2026 19:38:06 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1696515#M75191</guid>
      <dc:creator>HaydenWelch</dc:creator>
      <dc:date>2026-04-15T19:38:06Z</dc:date>
    </item>
    <item>
      <title>Re: A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1696734#M75192</link>
      <description>&lt;P&gt;I will have to check out the library on GH.&amp;nbsp; I don't fully understand what the screenshot is showing?&amp;nbsp; Are you just running the same code block in a loop using the same data set(s) and arguments?&amp;nbsp; If so, the file system caching done at the OS level and the caching happening in ArcGIS code is likely influencing the results.&amp;nbsp; For code like this, the cold call performance is what is important, right?&amp;nbsp; What types of latency are you dealing with to your network file shares?&amp;nbsp; I know your screenshot is for a test database or example database, but are these times slow enough to warrant the time to optimize an approach in the first place?&lt;/P&gt;</description>
      <pubDate>Thu, 16 Apr 2026 15:22:46 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1696734#M75192</guid>
      <dc:creator>JoshuaBixby</dc:creator>
      <dc:date>2026-04-16T15:22:46Z</dc:date>
    </item>
    <item>
      <title>Re: A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1696875#M75193</link>
      <description>&lt;P&gt;When testing on our fileserver, this same database initialization went from 30s to 15s (we have thousands of file databases with ~100 feature classes and ~50 tables each). There is filesystem caching, but the actual Walk call itself is slow, and since I have to call it multiple times to get all the features initialized into the proper types, I had to pay that cost sequentially, now those 4 walks (FeatureClass, Table, Relationships, FeatureDatasets) are run concurrently.&lt;/P&gt;&lt;P&gt;I'm only using walk instead of the List* functions because Walk doesn't rely on the global arcpy.env state which can be finicky when you start changing workspaces rapidly.&lt;/P&gt;&lt;P&gt;The code I shared above has been implemented in the initializer of that Dataset object, so the sequential runs are just showing that it's not having issues missing things or losing data. The cold start is expected of course.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Addendum: This code is also in a Python toolbox and is frequently used to scan a database to populate tool parameters or validate parameters, and since tools are re-initialized every time you modify a parameter, the cold call is actually not the limiting factor. Getting average call time down saves a lot. I will still usually create a global cache for a tool, or use the&amp;nbsp;@cached_property or @lru_cache decorators to prevent repeated loads on more expensive operations though.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Apr 2026 05:03:16 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1696875#M75193</guid>
      <dc:creator>HaydenWelch</dc:creator>
      <dc:date>2026-04-17T05:03:16Z</dc:date>
    </item>
    <item>
      <title>Re: A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697140#M75195</link>
      <description>&lt;P&gt;I know there is more to your repo than just getting lists of feature classes and tables from file geodatabases, but when it comes to getting lists of feature classes and tables from file geodatabases I am not seeing any benefit most of the time from using multiprocessing vs sequential calling regardless of network latency.&amp;nbsp; The code structure I have been comparing is (da.Walk initialization/workaround handled upstream from code snippet):&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# ---------------------------------------------------------------------------
# Shared worker
# ---------------------------------------------------------------------------

def walk(ds: str, dtype: str | None = None) -&amp;gt; list[Path]:
    paths: list[Path] = []
    for root, dirnames, filenames in Walk(ds, datatype=dtype):
        for itm in entries:
            paths.append(Path(root) / itm)
    return paths


# ---------------------------------------------------------------------------
# The parallel and sequential dispatchers.  Keep these as structurally
# similar as possible.
# ---------------------------------------------------------------------------

def extract_types_threaded(ds: str, dtypes: list[str]) -&amp;gt; dict[str, list[Path]]:
    """One thread per datatype. Matches OP post."""
    data: dict[str, list[Path]] = {}
    with ThreadPoolExecutor(max_workers=len(dtypes)) as executor:
        futures = {executor.submit(walk, ds, dtype): dtype for dtype in dtypes}
        for future in as_completed(futures):
            data[futures[future]] = future.result()
    return data


def extract_types_sequential(ds: str, dtypes: list[str]) -&amp;gt; dict[str, list[Path]]:
    """One call per datatype, main thread only."""
    return {dtype: walk(ds, dtype) for dtype in dtypes}&lt;/LI-CODE&gt;&lt;P&gt;&lt;BR /&gt;If you are so inclined and have the time, I posted&amp;nbsp;&lt;A href="https://gist.github.com/bixb0012/95ec027e232897dda8a598520277f6a7" target="_blank" rel="noopener"&gt;ArcPy: Generate FGDB Collection&lt;/A&gt;&amp;nbsp;as a GitHub gist to create a collection of differently structured file geodatabases for synthetic testing.&amp;nbsp; I would be interested in what the timings look like for you getting feature class and table lists from these FGDBs on your network file share.&amp;nbsp; And what is the latency to your network file share from the client running the tests.&lt;/P&gt;&lt;DIV class=""&gt;&lt;H1&gt;What a default run produces&lt;/H1&gt;&lt;P&gt;Running generate_fgdb_collection.py with no arguments writes the collection to ./collection/ and builds every profile in the catalog. Each profile produces one .gdb directory plus a sibling .manifest.json for the correctness checker.&lt;/P&gt;&lt;H2&gt;Per-profile breakdown&lt;/H2&gt;Profile FCs Tables FDs RCs Total Purpose &lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;empty&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;Zero-of-everything edge case&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;tiny&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;8&lt;/TD&gt;&lt;TD&gt;Sanity, one of each type&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;flat_small&lt;/TD&gt;&lt;TD&gt;20&lt;/TD&gt;&lt;TD&gt;10&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;30&lt;/TD&gt;&lt;TD&gt;Isolates root enumeration&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;flat_medium&lt;/TD&gt;&lt;TD&gt;100&lt;/TD&gt;&lt;TD&gt;50&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;155&lt;/TD&gt;&lt;TD&gt;Forum-thread baseline&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;nested_medium&lt;/TD&gt;&lt;TD&gt;100&lt;/TD&gt;&lt;TD&gt;50&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;160&lt;/TD&gt;&lt;TD&gt;Tests FD recursion&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;wide_datasets&lt;/TD&gt;&lt;TD&gt;100&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;20&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;120&lt;/TD&gt;&lt;TD&gt;Stresses FD count&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;deep_only&lt;/TD&gt;&lt;TD&gt;40&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;41&lt;/TD&gt;&lt;TD&gt;Single FD holds all&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;rc_heavy&lt;/TD&gt;&lt;TD&gt;20&lt;/TD&gt;&lt;TD&gt;20&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;50&lt;/TD&gt;&lt;TD&gt;90&lt;/TD&gt;&lt;TD&gt;Stresses RC enumeration&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;xl&lt;/TD&gt;&lt;TD&gt;500&lt;/TD&gt;&lt;TD&gt;200&lt;/TD&gt;&lt;TD&gt;10&lt;/TD&gt;&lt;TD&gt;30&lt;/TD&gt;&lt;TD&gt;740&lt;/TD&gt;&lt;TD&gt;Stress tier, largest workload&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;Total&lt;/TD&gt;&lt;TD&gt;884&lt;/TD&gt;&lt;TD&gt;332&lt;/TD&gt;&lt;TD&gt;37&lt;/TD&gt;&lt;TD&gt;91&lt;/TD&gt;&lt;TD&gt;1,344&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;FCs&lt;/STRONG&gt; counts all feature classes including those inside feature datasets, so nested profiles roll up. &lt;STRONG&gt;Total&lt;/STRONG&gt; is the sum of FCs, tables, FDs, and RCs for that profile; it excludes domains (which Walk does not enumerate). Every FC and table gets five baseline fields (NAME, VALUE, CATEGORY, CREATED, FOREIGN_KEY) plus three deterministic-random extras. All objects are empty; Walk enumerates schema, not rows.&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Sun, 19 Apr 2026 21:05:31 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697140#M75195</guid>
      <dc:creator>JoshuaBixby</dc:creator>
      <dc:date>2026-04-19T21:05:31Z</dc:date>
    </item>
    <item>
      <title>Re: A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697151#M75196</link>
      <description>&lt;P&gt;I'll give this a shot tomorrow, it's very possible that our database schema or network latency is to blame, but I definitely got a speedup with the threaded version when targeting our databases (there's also ~450 attribute rules in each one which may be causing some weird slowdown).&lt;/P&gt;&lt;P&gt;If you don't mind, I might end up adding this gist to the repo testing area so I can use it for general performance benchmarking.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 00:02:32 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697151#M75196</guid>
      <dc:creator>HaydenWelch</dc:creator>
      <dc:date>2026-04-20T00:02:32Z</dc:date>
    </item>
    <item>
      <title>Re: A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697254#M75197</link>
      <description>&lt;P&gt;Feel free to recycle the gist however you please.&amp;nbsp; There is something odd, odd in the sense that I can't explain it yet, with how da.Walk is working with multi-threading.&amp;nbsp; I spent several hours yesterday devising different tests, and I may be finally getting close to isolating why the multi-threaded da.Walk results aren't as robust as one would expect.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 15:33:30 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697254#M75197</guid>
      <dc:creator>JoshuaBixby</dc:creator>
      <dc:date>2026-04-20T15:33:30Z</dc:date>
    </item>
    <item>
      <title>Re: A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697280#M75198</link>
      <description>&lt;P&gt;I just ran ~2hours of tests and ended up with some incredibly puzzling results. You are correct that on average, sequential reads are faster, but there seems to be some odd edge cases where the threaded version is SIGNIFICANTLY faster:&lt;/P&gt;&lt;TABLE border="0" cellspacing="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD height="17"&gt;&lt;STRONG&gt;Dataset&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;Dataset Local Sequential&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;Dataset Local Threaded&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;% Change&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;empty&lt;/TD&gt;&lt;TD&gt;0.98s&lt;/TD&gt;&lt;TD&gt;1.64s&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#FF0000"&gt;+67.24%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;tiny&lt;/TD&gt;&lt;TD&gt;5.51s&lt;/TD&gt;&lt;TD&gt;3.69s&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#00FF00"&gt;-32.98%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;flat_small&lt;/TD&gt;&lt;TD&gt;3.43s&lt;/TD&gt;&lt;TD&gt;4.04s&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#FF0000"&gt;+17.87%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;flat_medium&lt;/TD&gt;&lt;TD&gt;14.45s&lt;/TD&gt;&lt;TD&gt;14.95s&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#FF0000"&gt;+3.39%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;nested_medium&lt;/TD&gt;&lt;TD&gt;10.74s&lt;/TD&gt;&lt;TD&gt;12.30s&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#FF0000"&gt;+14.52%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;wide_datasets&lt;/TD&gt;&lt;TD&gt;62.70s&lt;/TD&gt;&lt;TD&gt;22.75s&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#00FF00"&gt;-63.71%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;deep_only&lt;/TD&gt;&lt;TD&gt;1.66s&lt;/TD&gt;&lt;TD&gt;2.59s&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#FF0000"&gt;+56.67%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;rc_heavy&lt;/TD&gt;&lt;TD&gt;5.48s&lt;/TD&gt;&lt;TD&gt;5.14s&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#00FF00"&gt;-6.31%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;xl&lt;/TD&gt;&lt;TD&gt;85.45s&lt;/TD&gt;&lt;TD&gt;66.31s&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#00FF00"&gt;-22.39%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;populated_production&lt;/TD&gt;&lt;TD&gt;14.40s&lt;/TD&gt;&lt;TD&gt;12.34s&lt;/TD&gt;&lt;TD&gt;&lt;FONT color="#00FF00"&gt;-14.30%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="0" cellspacing="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="175.383px" height="47px"&gt;&lt;STRONG&gt;Dataset&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="208px" height="47px"&gt;&lt;STRONG&gt;Dataset Network Sequential&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="197.683px" height="47px"&gt;&lt;STRONG&gt;Dataset Network Threaded&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD width="75.9333px" height="47px"&gt;&lt;STRONG&gt;% Change&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="175.383px" height="25px"&gt;&lt;FONT color="#000000"&gt;empty&lt;/FONT&gt;&lt;/TD&gt;&lt;TD width="208px" height="25px"&gt;4.54s&lt;/TD&gt;&lt;TD width="197.683px" height="25px"&gt;8.20s&lt;/TD&gt;&lt;TD width="75.9333px" height="25px"&gt;&lt;FONT color="#FF0000"&gt;+80.52%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="175.383px" height="25px"&gt;&lt;FONT color="#000000"&gt;tiny&lt;/FONT&gt;&lt;/TD&gt;&lt;TD width="208px" height="25px"&gt;38.57s&lt;/TD&gt;&lt;TD width="197.683px" height="25px"&gt;25.71s&lt;/TD&gt;&lt;TD width="75.9333px" height="25px"&gt;&lt;FONT color="#00FF00"&gt;-33.34%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="175.383px" height="25px"&gt;&lt;FONT color="#000000"&gt;flat_small&lt;/FONT&gt;&lt;/TD&gt;&lt;TD width="208px" height="25px"&gt;25.83s&lt;/TD&gt;&lt;TD width="197.683px" height="25px"&gt;29.46s&lt;/TD&gt;&lt;TD width="75.9333px" height="25px"&gt;&lt;FONT color="#FF0000"&gt;+14.04%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="175.383px" height="25px"&gt;&lt;FONT color="#000000"&gt;flat_medium&lt;/FONT&gt;&lt;/TD&gt;&lt;TD width="208px" height="25px"&gt;109.82s&lt;/TD&gt;&lt;TD width="197.683px" height="25px"&gt;116.67s&lt;/TD&gt;&lt;TD width="75.9333px" height="25px"&gt;&lt;FONT color="#FF0000"&gt;+6.24%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="175.383px" height="25px"&gt;&lt;FONT color="#000000"&gt;nested_medium&lt;/FONT&gt;&lt;/TD&gt;&lt;TD width="208px" height="25px"&gt;90.18s&lt;/TD&gt;&lt;TD width="197.683px" height="25px"&gt;97.47s&lt;/TD&gt;&lt;TD width="75.9333px" height="25px"&gt;&lt;FONT color="#FF0000"&gt;+8.09%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="175.383px" height="25px"&gt;&lt;FONT color="#000000"&gt;wide_datasets&lt;/FONT&gt;&lt;/TD&gt;&lt;TD width="208px" height="25px"&gt;446.37s&lt;/TD&gt;&lt;TD width="197.683px" height="25px"&gt;173.41s&lt;/TD&gt;&lt;TD width="75.9333px" height="25px"&gt;&lt;FONT color="#00FF00"&gt;-61.15%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="175.383px" height="25px"&gt;&lt;FONT color="#000000"&gt;deep_only&lt;/FONT&gt;&lt;/TD&gt;&lt;TD width="208px" height="25px"&gt;18.79s&lt;/TD&gt;&lt;TD width="197.683px" height="25px"&gt;23.68s&lt;/TD&gt;&lt;TD width="75.9333px" height="25px"&gt;&lt;FONT color="#FF0000"&gt;+26.03%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="175.383px" height="25px"&gt;&lt;FONT color="#000000"&gt;rc_heavy&lt;/FONT&gt;&lt;/TD&gt;&lt;TD width="208px" height="25px"&gt;44.38s&lt;/TD&gt;&lt;TD width="197.683px" height="25px"&gt;38.43s&lt;/TD&gt;&lt;TD width="75.9333px" height="25px"&gt;&lt;FONT color="#00FF00"&gt;-13.40%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="175.383px" height="25px"&gt;&lt;FONT color="#000000"&gt;xl&lt;/FONT&gt;&lt;/TD&gt;&lt;TD width="208px" height="25px"&gt;677.89s&lt;/TD&gt;&lt;TD width="197.683px" height="25px"&gt;491.07s&lt;/TD&gt;&lt;TD width="75.9333px" height="25px"&gt;&lt;FONT color="#00FF00"&gt;-27.56%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="175.383px" height="25px"&gt;&lt;FONT color="#000000"&gt;populated_production&lt;/FONT&gt;&lt;/TD&gt;&lt;TD width="208px" height="25px"&gt;242.95s&lt;/TD&gt;&lt;TD width="197.683px" height="25px"&gt;161.28s&lt;/TD&gt;&lt;TD width="75.9333px" height="25px"&gt;&lt;FONT color="#00FF00"&gt;-33.62%&lt;/FONT&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As you can see, our network is incredibly slow for some reason (the company IT department rolled out some RMM recently that may very well be causing all sorts of issues with IOPS).&lt;/P&gt;&lt;P&gt;All data was gathered using the python timeit module with 10 loops of each run. I added a flag to the Dataset constructor that switches the walk method between the threaded and sequential versions:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;from pathlib import Path
from timeit import timeit
from arcpie import Dataset

LOCAL = Path().home() / 'collection'
NETWORK = Path(r"S:\Test\collection")

# production_populated gdb was added manually to both collections
local_datasets = {ds.name: ds for ds in LOCAL.glob('*.gdb')}
network_datasets = {ds.name: ds for ds in NETWORK.glob('*.gdb')}

tests = {
    'local threaded': 'Dataset(ds_path, _threaded=True)',
    'local sequential': 'Dataset(local_ds, _threaded=False)',
    'network threaded': 'Dataset(network_ds, _threaded=True)',
    'network sequential': 'Dataset(network_ds, _threaded=False)',
}

results = {}
number = 10

for ds_name, ds_path in local_datasets.items():
    local_ds = local_datasets[ds_name]
    network_ds = network_datasets[ds_name]
    # Warmup
    _ = timeit(stmt=f'Dataset(ds_path)', number=1, globals=globals())
    
    for name, test in tests.items():
        res = timeit(stmt=test, number=number, globals=globals())
        results[name] = res
        print(f'{ds_name} {name}: {res:0.2f}s')&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;EDIT: So I've tried another method that seems to work pretty well too. It is consistent across all environments and is always at least equal to whatever the fastest method is, I just read the a000000004.gdbtable file:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;TAG_MAP = {
    'FeatureDataset': b'&amp;lt;DEFeatureDataset',
    'FeatureClass': b'&amp;lt;DEFeatureClassInfo',
    'Table': b'&amp;lt;DETableInfo',
    'RelationshipClass': b'&amp;lt;DERelationshipClassInfo',
}
PATH_TAGS = b'&amp;lt;CatalogPath&amp;gt;', b'&amp;lt;/CatalogPath&amp;gt;'
# Currently NetworkDataset topologies are captured as FeatureClasses
# This can be used to filter them out, but it's not critical
TYPE_TAGS = b'&amp;lt;DatasetType&amp;gt;', b'&amp;lt;/DatasetType&amp;gt;'
TABLE_FILE = 'a00000004.gdbtable'

# Read the raw table info from the a00000004 file
def _extract_types_a00000004(ds: Path | str, dtypes: list[str]) -&amp;gt; dict[str, list[Path]]:
    a4_table = Path(ds) / TABLE_FILE
    data: dict[str, list[Path]] = defaultdict(list)
    with a4_table.open('rb') as a4_file:
        for line in a4_file.readlines()[1:]:
            types_on_line: list[tuple[str, bytes]] = []
            for dtype in dtypes:
                if TAG_MAP[dtype] in line:
                    types_on_line.append((dtype, TAG_MAP[dtype]))     
            for dtype, open_tag in types_on_line:
                catalog_path = line.split(open_tag)[1].split(PATH_TAGS[0])[1].split(PATH_TAGS[1])[0].decode()[1:]
                if catalog_path.endswith('.gdb'):
                    continue
                data[dtype].append(Path(ds) / catalog_path)
        return data&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is a very rudimentary implementation that could use some optimization, but if the Walk function is a black box, why not just get the data straight from the source. No need to even parse the XML since each table entry is on one line. Just match the open tag.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's the timings on a local database:&lt;/P&gt;&lt;TABLE border="0" cellspacing="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD height="17"&gt;&lt;STRONG&gt;Dataset&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;Dataset Local Sequential&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;Dataset Local Threaded&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;Dataset Local Raw&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;empty&lt;/TD&gt;&lt;TD&gt;0.06s&lt;/TD&gt;&lt;TD&gt;0.13s&lt;/TD&gt;&lt;TD&gt;0.02s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;tiny&lt;/TD&gt;&lt;TD&gt;0.16s&lt;/TD&gt;&lt;TD&gt;0.21s&lt;/TD&gt;&lt;TD&gt;0.05s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;flat_small&lt;/TD&gt;&lt;TD&gt;0.25s&lt;/TD&gt;&lt;TD&gt;0.32s&lt;/TD&gt;&lt;TD&gt;0.19s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;flat_medium&lt;/TD&gt;&lt;TD&gt;1.18s&lt;/TD&gt;&lt;TD&gt;1.22s&lt;/TD&gt;&lt;TD&gt;0.97s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;nested_medium&lt;/TD&gt;&lt;TD&gt;0.98s&lt;/TD&gt;&lt;TD&gt;1.02s&lt;/TD&gt;&lt;TD&gt;0.66s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;wide_datasets&lt;/TD&gt;&lt;TD&gt;1.39s&lt;/TD&gt;&lt;TD&gt;1.23s&lt;/TD&gt;&lt;TD&gt;0.65s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;deep_only&lt;/TD&gt;&lt;TD&gt;0.20s&lt;/TD&gt;&lt;TD&gt;0.28s&lt;/TD&gt;&lt;TD&gt;0.08s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;rc_heavy&lt;/TD&gt;&lt;TD&gt;0.58s&lt;/TD&gt;&lt;TD&gt;0.54s&lt;/TD&gt;&lt;TD&gt;0.36s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;xl&lt;/TD&gt;&lt;TD&gt;7.39s&lt;/TD&gt;&lt;TD&gt;5.50s&lt;/TD&gt;&lt;TD&gt;4.21s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;populated_production&lt;/TD&gt;&lt;TD&gt;3.20s&lt;/TD&gt;&lt;TD&gt;2.35s&lt;/TD&gt;&lt;TD&gt;2.14s&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And with the network databases:&lt;/P&gt;&lt;TABLE border="0" cellspacing="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD height="17"&gt;&lt;STRONG&gt;Dataset&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;Dataset Network Sequential&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;Dataset Network Threaded&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;Dataset Network Raw&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;empty&lt;/TD&gt;&lt;TD&gt;0.27s&lt;/TD&gt;&lt;TD&gt;0.54s&lt;/TD&gt;&lt;TD&gt;0.09s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;tiny&lt;/TD&gt;&lt;TD&gt;0.58s&lt;/TD&gt;&lt;TD&gt;1.01s&lt;/TD&gt;&lt;TD&gt;0.27s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;flat_small&lt;/TD&gt;&lt;TD&gt;1.53s&lt;/TD&gt;&lt;TD&gt;1.76s&lt;/TD&gt;&lt;TD&gt;1.13s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;flat_medium&lt;/TD&gt;&lt;TD&gt;6.09s&lt;/TD&gt;&lt;TD&gt;6.79s&lt;/TD&gt;&lt;TD&gt;5.22s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;nested_medium&lt;/TD&gt;&lt;TD&gt;4.84s&lt;/TD&gt;&lt;TD&gt;5.60s&lt;/TD&gt;&lt;TD&gt;3.57s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;wide_datasets&lt;/TD&gt;&lt;TD&gt;28.42s&lt;/TD&gt;&lt;TD&gt;11.34s&lt;/TD&gt;&lt;TD&gt;8.03s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;deep_only&lt;/TD&gt;&lt;TD&gt;2.84s&lt;/TD&gt;&lt;TD&gt;1.88s&lt;/TD&gt;&lt;TD&gt;1.01s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;rc_heavy&lt;/TD&gt;&lt;TD&gt;2.72s&lt;/TD&gt;&lt;TD&gt;2.40s&lt;/TD&gt;&lt;TD&gt;1.72s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;xl&lt;/TD&gt;&lt;TD&gt;39.19s&lt;/TD&gt;&lt;TD&gt;30.27s&lt;/TD&gt;&lt;TD&gt;24.11s&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD height="17"&gt;populated_production&lt;/TD&gt;&lt;TD&gt;13.36s&lt;/TD&gt;&lt;TD&gt;9.62s&lt;/TD&gt;&lt;TD&gt;8.12s&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
      <pubDate>Mon, 20 Apr 2026 21:44:46 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697280#M75198</guid>
      <dc:creator>HaydenWelch</dc:creator>
      <dc:date>2026-04-20T21:44:46Z</dc:date>
    </item>
    <item>
      <title>Re: A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697350#M75202</link>
      <description>&lt;P&gt;I have used some of the publicly available reverse-engineered FGDB specifications to do this exact same kind of thing.&amp;nbsp; I even created a locking mechanism that emulates what some SDK calls do to ensure I am playing nice within the FGDB when reading it files, but that is a different discussion thread.&lt;BR /&gt;&lt;BR /&gt;Looking at your network dataset times, I suspect 2/3 to 3/4 would actually benefit from multi-processing instead of multi-threading.&amp;nbsp; The bottleneck with Walk isn't the FGDB API but something in either Walk itself or internal code that Walk depends upon.&amp;nbsp; It is perfectly fine to have multiple, even many, processes reading the same FGDB at the same time without issue.&amp;nbsp; Even though initializing an arcpy license does take a few seconds, if enumerating a data type in an network share FGDB takes tens of seconds, that 3-second hit from initializing a library isn't that big.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 22:53:38 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697350#M75202</guid>
      <dc:creator>JoshuaBixby</dc:creator>
      <dc:date>2026-04-20T22:53:38Z</dc:date>
    </item>
    <item>
      <title>Re: A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697352#M75203</link>
      <description>&lt;P&gt;I ended up doing a hybrid approach. Initially I was just checking the gdb directory and counting the .gdbtable files and switching modes depending on count.&lt;/P&gt;&lt;P&gt;Seems that it all kinda falls apart with nested datasets though (which most of my databases are) so I went down the path of using the GDAL reverse engineering project.&lt;/P&gt;&lt;P&gt;Then I realized that you can literally just read the opening xml tag bytes line by line. Ideally I could parse in the whole XML file, but since really just need the paths (the actual access to the features are done lazily). I can just strip out the paths.&lt;/P&gt;&lt;P&gt;Might not even need to worry about file locking since it loads the whole file in before it parses it.&lt;/P&gt;&lt;P&gt;May end up expanding on this in a submodule and actually properly parse the GDB using Python. It's kinda insane how a hacky 2 second pure Python solution is so much faster than every single official solution I tried.&lt;/P&gt;&lt;P&gt;(Describe is almost 10x slower than all of these options combined, and the List functions are about 2x slower when you're dealing with Datasets)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As to your Multiprocessing point, I try to avoid it since the process tree can play weirdly with exception flows. I've also had issues with multiprocessing kicking users out of their sessions or crashing because the session lock was in another process.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I may still give it a try tomorrow and see if it works, but I think the raw file read is probably the fastest solution since it has virtually no overhead.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sidenote: I messed up by first 2 tables by a factor of 10, I forgot that timeit gives you total time for all runs, not average time.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2026 23:30:52 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697352#M75203</guid>
      <dc:creator>HaydenWelch</dc:creator>
      <dc:date>2026-04-20T23:30:52Z</dc:date>
    </item>
    <item>
      <title>Re: A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697442#M75206</link>
      <description>&lt;P&gt;OK, I think I finally understand (or mostly) what is going on with arcpy.da.Walk and multithreading.&amp;nbsp; As you know, I created some synthetic datasets to represent a range of GDB structure, and have been testing those datasets on local SSD, LAN SMB (~1.25 ms), and WAN SMB (~20 ms).&lt;/P&gt;&lt;P&gt;arcpy.da.Walk is an os.walk-shaped generator — one __next__() per directory — and it holds some in-process lock during each __next__() call, releasing between yields.&amp;nbsp; A probe driving two Walk generators from two threads (thread A, thread B) showed the two threads' iterations alternate perfectly in lockstep (B, A, B, A, ...) on a gdb with feature datasets, with ~80% wall-clock overlap.&amp;nbsp; So threads do get scheduled and do share the work, but they share it by taking turns at the lock, not by running truly in parallel.&amp;nbsp; That puts a ceiling on how much threading can help.&amp;nbsp; Total lock-holding time is bounded below by total work, so two threads can't beat one by much.&lt;BR /&gt;&lt;BR /&gt;The practical consequence is that how much threading wins depends on your gdb's directory structure.&amp;nbsp; A flat gdb is one directory with one big yield, so two threads can't interleave at all.&amp;nbsp; This means the second thread just waits until the first is done.&amp;nbsp; A gdb with many feature datasets has many yield points and the two threads can rotate through the lock productively, which is probably where some of your time savings is coming from in your tests. In my testing, ProcessPoolExecutor gave a more predictable ~1.6-1.9× speedup at WAN latency regardless of gdb shape, because separate processes have separate arcpy state and don't share the lock at all. The tradeoff is the ~3s cost of spinning up each worker (fresh arcpy import, license check), which is fine if each gdb walk takes more than a few seconds but eats the benefit on fast walks. If you're walking many gdbs over a WAN, the biggest time savings is probably one process per gdb rather than threading within one gdb — that axis scales cleanly and doesn't depend on what any individual gdb looks like.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 14:36:24 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697442#M75206</guid>
      <dc:creator>JoshuaBixby</dc:creator>
      <dc:date>2026-04-21T14:36:24Z</dc:date>
    </item>
    <item>
      <title>Re: A Fun Threading Situation With da.Walk</title>
      <link>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697612#M75209</link>
      <description>&lt;P&gt;That's about what I expected after running it on your test databases. I still ended up just abandoning Walk entirely and switched over to the raw read of the gdb table. I included fallbacks in case that breaks, but I'm already seeing a consistent 100x speedup just parsing the XML GDBTable info in the a00..4.gdbtable file. I think Walk and List* are doing a lot more under the hood than I actually need since the initialization is inherently lazy, so just finding the path is enough for my needs.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2026 13:06:05 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/a-fun-threading-situation-with-da-walk/m-p/1697612#M75209</guid>
      <dc:creator>HaydenWelch</dc:creator>
      <dc:date>2026-04-22T13:06:05Z</dc:date>
    </item>
  </channel>
</rss>

