How do you keep track of data and item dependencies?

1199
3
10-02-2019 05:51 AM
Labels (1)
TerriWaddell1
New Contributor III

Hello,

As our needs have changed over the years, we have far more apps, custom tools/scripts/models, Data Reviewer tests, etc., then we've ever had in the past.  If one field name or value is changed in our SDE database, it is possible that numerous things downstream would need to be updated to reflect that one small database change. These are a nightmare to keep up with.  While we try not to change anything in our database now, sometimes it is inevitable (shout out to all of you who are preparing for NG-911, I feel your pain!) 

I was wondering how you keep track of all of your agency's data dependencies.  I started out with an Excel document that had the feature class name, field name, and domain name, with another column for the "item" that I might need to update if the field or domain changed.  It's up to more then 2300 listings and I haven't even finished plugging in the names of all of my Data Reviewer tests.  Now I'm leaning towards making an Access database that would run in a similar fashion but be easier to search on; I've done quite a bit of research and other then coming across this link for Item Dependencies (Mapping Item Dependencies Across ArcGIS Enterprise with Python and d3), I haven't found any programs that could help, no useful advice.  There has to be a better way. 

I'd love to hear how you've been keeping track of all of your data and everything tied to it.

-Terri Waddell

Aiken County Government

Tags (1)
3 Replies
SallieVaughn
Occasional Contributor

I'd love to know what you find out about this. There used to be some functionality to export XML schemas from SDE and then import them into Access or Visio, but I haven't tried that in years and don't know the current feasibility. That would at least get you a raw schema that you could then mark up.

With NG-911 looming in my near future, this is a huge issue for us. I'm basically going to implement a policy of only adding additional columns and not attempting to rename any. The result is huge, unwieldly tables, but the number of staff impacted by that is far less than staff, apps, citizens, workflows, etc., that would be impacted by simply renaming a column.

0 Kudos
feralcatcolonist_old
Occasional Contributor II

Maybe not the most best way ever, but if you're keeping track of dependencies within the metadata, you could hashtag or keyword them and use the new-ish metadata module in ArcPy to grab that out. This is a script I wrote to dump out metadata information, should be a good starting point for playing around with the new module:

import arcpy
import pandas as pd
import xml.etree
from arcpy import metadata as md

metadata_dict = {}
# ArcCatalog metadata editor uses a WYSIWYG style but the trade-off is that those HTML tags will clutter your output
# first lambda function scrubs HTML tags using xml.etree from the standard library
# second lambda function reuses and does a character count
# lambdas are needed here just in case you have null values in your data
scrubbed = lambda x: ''.join(xml.etree.ElementTree.fromstring(x).itertext()) if x else x
scrubbed_length = lambda x: len(''.join(xml.etree.ElementTree.fromstring(x).itertext())) if x else x
i = 0

sde_connection = "--YOUR SDE CONNECTION HERE--"

for dirpath, dirnames, filenames in arcpy.da.Walk(sde_connection):
    for filename in filenames:
        # The new Metadata module from arcpy that makes everything possible!
        file_metadata = md.Metadata(f"{dirpath}\{filename}")
        # Loop dictionary will be a single record row for our metadata dataset
        file_dict = {f"{i}":  [
            dirpath.replace(f"{sde_connection}\\", ""),
            filename,
            file_metadata.title,
            file_metadata.tags,
            file_metadata.summary,
            scrubbed(file_metadata.description),
            scrubbed_length(file_metadata.description),
            file_metadata.credits,
            scrubbed(file_metadata.accessConstraints)
            ]}
        metadata_dict.update(file_dict) #Pushing this record group to the top-level dictionary
        print(f"Completed: {filename}") #I like seeing outputs when I run a script
        i = i + 1

# Pandas has a function to create a dataframe from a dictionary
metadata_df = pd.DataFrame.from_dict(
    metadata_dict,
    orient = 'index',
    columns = ["Feature Dataset", "Feature", "Title", "Tags", "Summary", "Description", "Description Length", "Credits", "Access Constraints"]
    )

metadata_df.to_csv(r"C:\Users\jcarmona\Desktop\Metadata_test\metadata.csv")

 


Likes an array of [cats, gardening, photography]
TerriWaddell1
New Contributor III

I don't have metadata in most of our feature classes (I know, BAD GIS person, BAD!).  I'll keep the idea above in mind for when I get to that happy time where I am finally been able to work on our metadata.  In the meantime, I figured I'd share what I've been doing.  It's as clunky as can be, and tedious to keep up with, but I've been making do for now.  ESRI, please come up with some sort of management software we can use that does this for us! 🙂

I use Excel simply because it's been years since I've used Access and don't want to spend time to re-learn it and design the database.  My column headings are "Feature Class or Table", "Field", "Domain", "Affected Item", and "Item Location".  

As an example, I have a feature dataset called "Roads" which has a field called "Status", which uses the "Road_Status" domain.  For each item in our system that uses the Roads feature class and may be affected by a change to the Status field, and/or the Road Status domain, I have one row entry.  So right now, I have 22 separate listings for Roads / Status.  Each listing is one of the following:

- specific MXDs or projects that would need to be updated if there were a change to Roads / Status.  For example, all of our pre-made political maps used by others in our organization have to be updated if there is a change.  I list just one "Political Maps" item in the affected item and then the folder where the political map documents are stored in Item Location, because there are hundreds of maps and I have trouble keeping up with this as it is.  In the perfect world I'd have one entry for each map, but that's not going to happen any time soon.

- vendor-specific items that have to be updated if there is a change to Roads/Status.  This might be any 3rd party products you use, ESRI apps, etc.  

- any automated scripts that run (such as data backups or copying data out for use by your police CAD system, things related to your data upload to the EGDMS or MSAG systems, etc.)

- I have a listing to remind me to update the metadata once I finally get to that.  

- specific/special projects that always have to be updated when there is a change, such as our Road Atlas 

- Each Data Reviewer check I have that in any way uses Roads / Status.  This is where my list gets very long, as this particular item is used in 17 different Data Reviewer checks.

- Any procedure documents or other documentation where this appears.

So doing it this way has created an absolutely gargantuan Excel file.  But what I do is just sort by whatever changed, and then I get a list of all the things that MAY be affected by a particular change.  Downside is I've only managed to get items into this Excel file for NENA/NG-911 related feature classes since that has been my sole focus for going on two years now.  Eventually I'll get everything else in there. It's also VERY easy to forget to update this file.  

I would *love* a better/easier way.  I hope this at least gives you some ideas, maybe somebody can take this and run with it and come up with something better.

0 Kudos