Utility Network Bulk Tracing (Looping Records) with Python

Andy_Morgan

After reading through various documentation and searching the Community board, I have yet to find a summary that thoroughly explains - with full examples - what's possible for bulk tracing a UN.

I'm on Enterprise 11.3, UN version 7, Pro 3.3.3.

My goal is to use either Python API or ArcPy for the following:

While looping through each of our water UN's ~77,000 line features...

Run an isolation trace on each feature, one at a time, and process the results for each trace. All I need are the "elements", no geometry.
For each starting water line segment ObjectID, extract the isolated valve ObjectIDs and isolated line ObjectIDs and then insert them into a SQL table as comma delimited for easy database retrieval (e.g. "3815, 3940, 9914, 2147"). In this way, I merely run a database query instead of executing an actual trace in the front end application/script: "...where ObjectID In (3815, 3940, 9914, 2147)"
This script would be run every so often (~ 4 times / year, maybe?)

Benefits of this approach:

No worries about dirty areas preventing the UN trace from executing for end users. Tracing ahead of time assures me that nobody will see an error saying that trace cannot run. Our Technicians post to Default throughout the day. Even though they are in the habit of Validating the Default to clear out everything including the harmless "Feature has been modified" dirty areas, they may forget or there may be a real error that isn't resolved at any given moment.
Lightning fast results for the front end application - for a single level isolation alone, but with this type of arrangement I could perform a double-level isolation very quickly which could be really beneficial in cases of a large main break so the crew knows for sure what all valves to close to guarantee water flow is blocked. Double level isolation may be rare, but it almost assures you that if the GIS data is off at least you have a safety net for identifying critical valves to close.
The results can now be used for other asset management scripts/workflows that would not otherwise be feasible if you were analyzing the entire system and had to execute a trace for each feature. It could take days to run continuously, which is unrealistic, when a simple database query for each line segment would require a tiny fraction of that time.

What works, what doesn't, where I lack knowledge:

Before going into specifics, my frustration is centered around the fact that it's hard to find a method that allows me to dynamically define my starting point (as a mid-point of each water main) for each iteration and then retrieve results in memory...preferably while running against a no dirty area version that is free from interruptions.

ArcPy Trace "arcpy.un.Trace(...)" - currently the only method that works well enough, if not ideal. I reference a starting points FileGDB (on C:\...) as the template. It has a single point feature. Using UpdateCursor I simply set the FeatureGlobalID of the current water main. It successfully completes the trace on a small scale so far, but I have to output the results to a physical JSON file where I then pull the "elements" properties and then delete the file...continue with looping.

In ArcPy I cannot seem to reference a version other than SDE.Default. I've tried appending syntax like this "?gdbversion=MyUser@Domain.TraceTesting" (both with and without forward slash before the "?") to the URL for the UN layer, but it doesn't seem to take.

ArcGIS Python for API - (arcgis.features.managers module) - either I cannot get the syntax right or even if I could I'm not sure it'll handle the per feature input as with the arcpy trace (using a FGDB point). I'm fairly confident my input parameters are fed in correctly with the "trace" method:

TraceLocations = [{
    "traceLocationType": "startingPoint",
    "globalId": GlobalID, ## example: "{288D22C3-301A-44D1-81BA-E66F094413D9}"
}]

traceConfiguration = {
 "includeContainers": True,
 "includeContent": False,
 "includeStructures": False,
 "includeBarriers": True,
...etc.
}

resultTypes=[{"type":"elements","includeGeometry":False,"includePropagatedValues":False,"networkAttributeNames":[],"diagramTemplateName":"",
              "resultTypeFields":[]}]

trace_results = UtilNetMgr.trace(locations=TraceLocations, trace_type="isolation", configuration=traceConfiguration, result_types=resultTypes)

It's supposed to produce a dictionary with {"traceResults": {"elements": list,}"success": bool}

Here's how it looks when my trace completes. With all the variations I've tried, I never see "traceResults" or "elements" returned.

REST API requests.post(service_url, data=payload, headers=headers) - it doesn't allow me to define a starting point dynamically while looping through my water line features. I can get it to run from REST endpoint using a Global ID of a water valve (device), but I cannot seem to get this approach to work as explained above. Can I reference local data? I don't want to store starting points in my enterprise geodatabase, since they change all the time with continual edits to our system.

BatchTrace (Utility-Data-Management-Support-Tools) isn't a viable solution if you're trying to handle results for each feature. In theory it sounds good, but practically speaking it's highly inefficient and unrealistic. The tracing still takes 15 - 20 seconds per feature, which would mean many days of running.

---------------

Here's my strategy to be most efficient: Instead of having to trace all 77,000 features, what I'll actually trace will be much less - perhaps as little as 1/10 of this total. For each line traced, I'm capturing all the lines being isolated from that run. Therefore, I already know that full group of lines is covered by a certain combination of barriers (valves). So I can then insert all those rows into my SQL table before moving on to a new isolation area, if that makes sense. I really just need to trace one line segment for each isolation area/group. It could entail 2 lines total or it could entail 18 lines, but it cuts down on a lot of processing.

RobertKrisher

@Andy_Morgan if you were already using the ArcGIS API for Python then there won't be much of a difference, but if you were calling the Trace GP tool, there should have been a noticeable difference because of the GP overhead.

In terms of parsing the results to differentiate, it makes the code quite a bit more complex (at the current release) but would add less than a second to your overall processing time. If you think the time investment and complexity is worth it, I can point you in the right direction, but it isn't for the faint of heart. We have some items on our roadmap to make identifying the barriers of a trace easier, but at this point you'd need to analyze the network in memory to determine this (a topic that there are usually presentations on at the Developer and Tech summit).

Andy_Morgan

Got it, yes I was using API for Python already.

Thank you for carrying on with this conversation. I'm hesitant to inquire about the parsing code, since it might cost me lots of time to follow and I may have put together the pseudocode that'll accomplish my goal in less than 2 days of execution time. I realized my current code won't produce reliable results because it wasn't factoring the nested trace groups.

Rather than write everything out, I produced an illustration in the form of a video. Do you think this could work if I can translate the last step into Python recursive looping to walk the system tabularly (ObjectIDs stored in a SQL Server table) to get the full set of isolated contents?

Summary of video: I'm running 2 separate traces. First a Connected trace for all lines, then an Isolation (barriers only) trace. What's important here is that each group of line features between the valves only requires a single Connected trace per group and a single Isolation trace per group. This cuts down on a large portion of traces. By "group" I mean water main segments F1 + F2 + F3, for example.

(view in My Videos)

RobertKrisher

@Andy_Morgan if you want to get the nested trace groups then you'll need to do secondary processing on the results. The downside of tracking which lines you've already isolated is that you can miss out on nested groups. If your deadline is in two days, I'd be prepared to roll out with this limitation and come back to a more precise solution for the next round.

Question: Do features that belong to a nested outage need to be excluded from higher level outages? If the answer is yes, then you're looking at a basic network partitioning scheme.

If you need this information, then you'll want to do the secondary processing on the isolation trace results. However, because those nested groups may have their own nested groups its not enough to just look at object ids, you need to an analysis of the network graph. This leads you down the path of needing to identify barriers. Which, once you get that working, means you can just analyze an entire pressure zone using a single trace.

I'd recommend you run the isolation trace first, get the results, then do all your secondary processing on that result set. this would let you

Andy_Morgan

I don't have a hard deadline, but at the same time I cannot afford to spend dozens of more hours on this research. My plan right now is to continue on with using the Python API to first run Connected traces (1-2 seconds each) and then Isolation traces (get barriers only) on a single segment within each connected trace group to be most efficient. I estimate about 40-50 hours for those steps, then some relatively quick "walking the network" to get isolated contents. These are being stored in separate SQL table columns per start line OID.

To be honest, you've lost me with the concept of "a basic network partitioning scheme." Running a single trace for the entire pressure plane would be great, but analyzing the results sounds extremely complicated.

I'm checking on the nested groups being excluded from higher level outages. I'm pretty sure the answer is no, we don't need to worry about excluding them. In that case, my routine should cover the need, it's just not as fast as I wanted it to be.

I'm also going to ask our IT to increase the heap size a little more, which might speed up the tracing just a bit...? Shaving off any time will only help.

Thanks for all the support. This topic is a tricky one to grasp.

MikeMillerGIS

I would not suggest doing this on the prod UN servers. Copy and Paste or recreate the UN in a MGDB via the asset package. If you are going to use Enterprise, you should parallelize the traces, and run maybe 8 concurrently.

Andy_Morgan

I tried each of these suggestions to see if it improves execution times.

MGDB: While not significant, it does shave off ~2 seconds (5-6 seconds per isolation trace instead of 7-8 seconds, on average). That certainly seems worth the effort to copy/paste the full UN, and it even retains my ObjectIDs which is a bonus. Thanks.

Parallelizing: I made a copy of my API for Python code and tried multiprocessing, if that's what you meant. For ease of testing, I defined some distinct sets of water main OIDs for my individually defined processes. I tried 8, then 6, then 4 parallel processes like this.

process1 = Process(target=GetIsolatingBarriers,...
process2 = Process(target=GetIsolatingBarriers,...
...
process1.start()
process2.start()
...
process1.join()
process2.join()
...

It seems to run synchronously, or at least there is no real improvement with multiple trace tasks completing together/overlapping. I don't know if I did this wrong, but then I wondered if maybe what's implied is to offload everything onto the server. Unfortunately, the "future" parameter in the "trace" method of arcgis.features._utility.UtilityNetworkManager is not available. Maybe that was added after our current version of Enterprise 11.3. Having several asynchronous traces running could theoretically be doing more processing, even if taxing on the server and something to avoid during business hours.

So if I'm left with this Mobile GDB option as being most efficient, given the version of Enterprise and Pro, then perhaps I'll settle on this for my initial solution. Later on, I could look into a more asynchronous/parallelized approach run directly against the Enterprise server.

MikeMillerGIS

Could go crazy and have X copies of the MGDB and then run traces in separate ones in sub processes. Just throwing out ideas.

RobertKrisher

@Andy_Morgan if every feature needs to be assigned to a single zone/partition, then the batch trace example can do that. Its getting a little late to change your approach now, but if you're going to be at the Developer's Summit in Palm Springs this year I can show you (or you can watch the demo theatre presentation I'm doing on the community sample).

Andy_Morgan

Dang, I won't be at the Dev Summit, unfortunately. It's been many years since my first attendance. That would have been convenient for such a conversation.