Python addin for data inventory and “broken-link” repair.

10070
29
08-31-2015 03:20 PM
RebeccaStrauch__GISP
MVP Emeritus
11 29 10.1K

Updated

2/7/2017  1/30/2017 - adding link to thread with info for finding/replacing graphic elements.  At some point I may add this functionality to the tool, but no time for that now.  But worth making note of possible solution:

   Python script for mass find and replace of workspace path? 

https://community.esri.com/message/664513-re-write-broken-source-list-to-text-file    and a simple script, if you do not want to use the addin

12/20/2016  -- my download and install instructions are a bit off.  Paul Davidson pointed out that it was not working.  If you download the file and it is called ChkandFixLinks.esriaddin.zip   you need to unzip it first, then double-click on the ChkandFixLinks.esriaddin  file.  Then it should work.  This is a toolbar for ArcCatalog (not ArcMap).  I just installed and tested with 10.5. (only tested the first 2 buttons to make sure it worked).

10/28/2015 New tool "Set Map Data Sources" provided by ArcGISTeam LocalGov   Have not tried, but worth a look

9/22/2015  4:45 pm (AK time)   --- removed extra quote in line 52 of the "fix" script that caused error. Updated attachment

CheckAndFixLinks.esriaddin

- Download esriaddin file   If downloads as a zip file, unzip so you see the .addin file.

- For Python addin, double click ChkandFixLinks.esriaddin to install  to ArcCatalog.

- if you prefer to see/modify scripts, rename the .esriaddin to .zip, the unzip.  Tool bar and scripts are available for viewing/editing.

At this time, this is the only download source.  May move to ArcScript 2.0 (when available) and/or may be interested in gitHub at some point.

ArcCatalog ToolBox – tested with 10.2.2/10.3.0  -- will create reports of all broken links, with option to repair all types of connections

Note: known issue, accessing mxd's saved as 10.3.x from the addin installed on a 10.2.2 machine.

This Toolbar can perform the following on a folder/subfolders (using walk):

  1. list your file geodatabases (FGDB), and approximate size on disk;
  2. inventory your features classes, all types including: FGDB, covers, grids, etc.;
  3. list broken links (based on machine/user running the tool) in MXDs;
  4. repair individual feature class broken links….including .sde and .ags connection, etc. (based on .csv file input)

The first three (left of Yield sign icon) create reports only (.txt, .csv, .xls)…they do not modify you mxds in anyway, so these may be nice tools to use even if you fix your links in another manner.

The Yield sign is to remind you that the input .csv file is required for input for the next tool (after the Yield). 

Note: For those that want a bulk drive-letter and/or <servername> change only, I removed this tool for this first release, but the code is there (same as 3a, and tool is shown in Toolbox in zip). I provided this, but suggest skipping the bulk change and using the other repair tool instead. (reason: bulk change uses findAndReplaceWorkspacePaths at the mxd level, and may change path for layers that need to be handled differently)

It is recommended that you create a copy of your mxd’s and do a few test runs to get familiar with the tools (because these do a “walk” do NOT place you backup copies in a subfolder of your working folder). I have an option to write the updated mxd’s to a new _repair folder, but I have this disabled at this time.  See the recommended workflow at the end of the document.

Notes and Cautions:

  • If you see this error: TypeError: GPToolDialog() takes at most 1 argument (2 given)” it can be ignore.  This is still arcpy bug (NIM089253)
  • The first three scripts only create lists with .csv, .xls, and/or .txt output files, so other than possibly taking a long time to run (if complex structure or slow system), they will not change any data. Most tools output a comma delimited (.csv), text (.txt), and/or an Excel (.xls) file.   By default, the YYYYMMDD_HHMM is appended to the name (startup timestamp for HHMM) so it is not overwritten on repeated runs, and the file(s) will be written to the folder being analyzed (you will need write-access to working drive to write the files).  You may need to eventually clean up (delete) these output files if you run the script often.
  • The Yield sign is a reminder that for the repair script, you need to modify a copy/modify the Excel/.csv file above, and save is as a .csv file for input to next script. When in doubt of a “new source”, do not enter a NewPath and leave NewType = “_review”.  Those old sources will not change.
  • The repair script (to right of the Yield sign) will change the mxd’s (if new source path in the .csv), so it is highly recommended that you backup your folders and mxd’s before running either one. Also, I suggest testing a few mxds in a safe location until you are comfortable with what each tool does  
  • ** Keep in mind that broken links are in “the eye of the beholder”, that is, broken based on the machine and user running the script, so when replacing paths, if you can use shared connection files, and or common path names, that will keep the mxd’s and data un-broken for multiple users.
  • If the mxd is storing “broken” user’s login credentials, running these programs may cause an unsuccessful login attempt and therefore lock the user (depending on your network setup). Keep this in mind so you can unlock the user as appropriate.
    • Caution
  • Some manual editing steps are required for to create the list to repair broken links tools.
  • If you have multiple users using the same mxd, consider using common mapping and or connection file. Store the connection files in a common location that is mapped the same for all.
  • Data will also echo messages in the Results table so you can track progress….but this can also create a very large result output. It is recommended that you “remove” the result output once the script is complete and you are done reviewing it the Result tab. (Leaving these large result output listings can significantly slow the opening and closing of ArcCatalog)
  • If Catalog closes before the script is complete, file will not be written to.

Tools (note: all tools use “walk” to include folder and subfolders😞

  • 1-List FGDB size on disk - to .csv .xls files (ListFGBsize.py)
    • Input arguments:
      • theWorkspace: drive or folder to walk through
      • outfile: default is GDBLIST, will append YYMMDD_HHMM and extensions
    • Field names in output: Name, GDBpath, and ApproxMB.

Default output name is GDBlist with a date-time (YYYYMMDD_HHMM) appended to the basename to keep output unique for repeated script execution.

  • 2-Inventory FC reports, .csv AND text output   (InventoryFC.py)
    • Originally had the two separate script for the outputs, but combined since 99% was exactly the same (and could write to two files in one pass)
    • For a given folder, identifies and creates list of a feature classes, including FGDB, covers, and grids
    • Outputs two files
      • .csv (common delimited) format
        • FType – a class name assigned by me, for example:
          • ArcInfoTable
          • CoverageFeatureClass
          • FeatureClass
          • (add raster sample)
          • FCname
            • Table of FC name if in a GDB
            • Arc (line), point, label, polygon if a coverage
            • (add raster sample)
            • FullPath
              • For file geodatabase, thru .fgb
              • For cover tables, thru “covers” folder
              • For covers, thru coverage name
          • .txt (very basic, report format – easier to visualize)
            • A couple header lines,
              • “List of all GIS data in <folder>  on <MM/DD/YYYY>
                Includes coverages (pts, poly, arc, anno), shapes, and FGDB data.
                -----------------------------------------------------“
              • followed by list of FGDB/workspace/folder; my featureclass tag, as shown above, and the features class files within them (indented for easy reading)
          • Neither of these files is currently a unique list, and some folders ( especially coverages) are repeated…may change this to be unique at some point, but not high priority
  • 3a-Create Unique list of Broken Links (with 3b has option fix drive letters first)
    • Creates csv, Excel (xls), and option FGDB (although I do not use, have not found use for this yet) of unique broken links within all mxds within the folder/subfolders.
      • Option removed for this release…3B OPTION: to repair drive letter changes before running. CAUTION: using this option will use findAndReplaceWorkspacePaths at the mxd level and may not be what you need….make sure you have a back up first.
    • Output formats
      • .csv (comma delimted) format (default)
      • .xls (Excel)
      • .txt report
      • Option: FGDB table
    • Output fields:
      • UniqID – auto incremented number, just to make it easier
      • dataType – a tag I assign to help identify source type, e.g. Fgdb, MapServer_connection, Table_other, etc.
      • newType – “__review”, text to remind you it needs to be reviewed for possible correction
      • brokenPath – self explanatory
      • newPath – self explanatory
  • 4-Repair broken link source (4_DataSourceRepairX.py)
    • Updates source paths of broken links, based on input .csv file
    • Input:
      • Folder to process (will also walk thry subfolders)
      • .csv file with newType and newPath updated
    • Outputs: CAUTION overwrites mxd, so made sure you have a copy in a location that is NOT in of below the folder that you will be processing (script has ability to use SaveAs, but not currently activated in tool)

Suggested workflow:

  1. Run script #1 and #2 to get a feel the data in you folder
  2. If you haven’t already, create backup/copy of the folder you will be working with.
  3. Run #3a (broken list, “without updates”) to find all broken links. 
  4. Review the output .xls and/or .csv.  Suggest making a copy of which ever is easier for you to work in.  They have the same info, just in two different formats. Suggested new name RepairBrokenLinks).
    • Caution: if/when sorting, make sure you have data selected.
    • I suggest you initially sort by datatype and remove all the “Group” and “Event” rows…those are for info only.
    • You may then want to sort by BrokenPath so you can find any pattern that may need to change to the same new source. 
      • For example, John had a source mapped as “d:\” , while Jane had it mapped as “f:\”  --- both show as broken links and you now want it to be mapped as a UNC path.  Add the path to newPath. If same data type, dupe value in dataType to newType … if changed, change newType as appropriate.
    • For changing SDE, I found creating new connections and saving them to a common location worked best.
    • For changing ArcGIS Services, I create a layer (.lyr) file and save it so a common location. The actually require the current connection be dropped, and the new layer be added (no replace workspace will work).
    • These are current data types the program can handle
      • cover_arc
      • cover_pont
      • cover_poly
      • cover_region
      • cover_tic
      • shape
      • fgdb
      • pgdb
      • sde
      • dbf
      • table_other
      • table_dat
      • txt
      • raster – may be SDE raster layers, or older NGS-TOPO! .tpq rasters
      • raster.jpg
      • raster.bmp
      • raster.gif
      • raster.jpg
      • raster.sid
      • raster.tif
      • service_<your AGS service name>
      • esri.sdc - this is for information only, no repair included in script….these records should be deleted before running fix
      • other – these may be coverages that could not be classified
      • group – this is for information only….these records should be deleted before running fix
      • events_table – this is for information only….these records should be deleted before running fix
      • _unknown  – this is for information only and not sure what these are. For me, listed .mxd name and may be those with .SDE issues….these records should be deleted before running fix
      • newType - “_review” until modified by user
      • brokenPath – broken data source path found in mxd
      • newPAth – empty until modified by user
  5. For any broken link that need further review (i.e. not read to change to a newPath), leave the newType as “_review” and the newPath blank.
  6. Once ready, run the#4 to repair the broken links.
  7. Once the repair is complete, run #3a again to get a new list of broken links remaining.
  8. Repeat #3a and #4 as needed.
29 Comments
About the Author
Worked with GIS for 30+ years for the Alaska Dept of Fish and Game.