Advanced Data Source Management

12-15-2010 01:34 AM
Status: Open
Labels (1)
Regular Contributor

The worst nightmare for every ArcGIS user is broken data links. ArcMap and ArcCatalog should provide robust, intelligent and automated data source management which continually maintains integrity of project’s data structure and maximally assists user in its preservation. This would bring more freedom to the work process and establish more user comfortable environment.

  • Active and permanent source data surveillance – monitoring of current projects and linked data locations throughout the work process. User is warned before the change with direct impact on project’s integrity (add, delete, move or rename). Approved changes are immediately reflected in the map document (sources are automatically updated).
  • Node-based data source manager – intuitive and interactive management of all externally referenced map data (geo, database, layout) and their relationship visualization (like Autodesk Flame, The Foundry Nuke).
  • Broken data links manager – report layers, databases and layout elements with missing data links on MXD open or LYR add. Provide possibility for automatic or manual repair.
  • Intelligent evaluation of the loss of data connection – identify whole or partial loss due to moved/renamed data sources or base folders (absolute/relative mode) or change of map document location (relative mode) when the project is opened or previewed.
  • Automatic search for moved or renamed layers or folders – finds original or similar sources in the neighborhood of missing data with identification and comparison of best matches according to pre-recorded file, geographic, database and metadata properties.
  • Automatic re-creation of entire broken data structure – recover automatically all missing sources according to pre-recorded information about map document and data sources (path, size, date, name…) or by user specified new data position.
  • Layer data source replacement assistant – to help properly update database dependent layer properties (symbology, definition query, labels and joins/relates) and provide smooth transition during the layer source exchange with emphasis on preserving as many layer settings as possible.
I'd like to see a reverse searching functionality in ArcCatalog that would work something like this:

You right click a feature class, raster, etc. in ArcCatalog and choose something like 'Search Projects that use this item'.  You could enter the file location you want to search through (ie.  all of the C: drive, just C:\projects, etc.).  You could also choose which types of files to search through (ie. Map Documents [.mxd's], ArcGIS Explorer documents [.nmf's], etc).  Then a dialog box would tell you all of the projects that are referencing this item.

If no projects were found to be referencing the item, then you could feel secure in renaming it, deleting it, etc.
I'm not so sure I want to see ArcGIS bogged down still further (it's already a very heavy-weight, slow-to-start program) with active monitoring and searching for missing data sources. We draw our data sources from local drives and from the network and from ArcSDE servers. The name space is huge, and the number of files in that name space numbers in the millions (that doesn't even consider internet data sources!). Network traffic is already intense without adding additional indexing and searching background processes. And while the reverse lookup ability in the above comment would certainly be nice, I can't fathom how it could reasonably handle cases where some or all of the referencing documents have been moved or archived. Should the data be considered to be de-referenced at that point or not?

ESRI has made progress in allowing data sources to be programmatically updated via Python tools, but what frustrates me is that not all data sources in a document can be addressed this way. Namely, according to the documentation, joined tables cannot have their sources fixed using the tools. It is a very rare map in our shop that doesn't use joined tables, meaning that we still can't use automated tools to fix map documents in a batch when our SDE server moves onto a new host, or we replace our file servers. If this last gap could be crossed, I think we could make do...
Since changing a file/folder name or location or gdb type breaks data sources, and since there currently isn't an easy fix (at least not in 9.3.1), providing a way to search for particular data layers (regardless format, e.g. shp or gdb) would be very helpful.  If I have to re-source mxds by hand because of a broken layer, this would at least point me in the right direction of which mxds to open.

Ideally, though, there should be a more global option to update all relevant mxds when a data layer changes, so we don't have to do this manually....
I started work on a similar tool in 9.3.1 using .Net.  Actually had it to the point where it would list distinct data sources in an mxd and all the layers/standalone tables/joins that used a specific data source.  I had just finished coding a one click change data source tool when I run into a bug that causes the mxd's symbology to be lost when the data sources are changed (this included Joins as well).  I just found out this bug is not to be fixed until version 10.1.

Anyway, this is functionality that should have been available a very long time ago.  ESRI's products are advertised as an Enterprise Solution and this is an issue that everyone designing in an enterprise environment will run into.  I for one hope that all the functionality listed here is implemented in the very near future.
There are various scripts floating around to do some aspects of this, but it is also pretty easy to create bad data connections via ArcObjects, and many of these solutions are on-off / not thoroughly tested on a wider range of data sources.  Having a powerful, supported batch repair tool with some of these advanced features would be an excellent addition to the software.
+1 for being able to preserve symbology, joins/relates, etc. as much as possible when changing data sources. So often, the new data source would work just fine with the existing symbology and joins ... it's a real hassle to have to recreate it all.

Guarenteed all ArcGIS Desktop users will have a broken data source sometime or another. Server names change, systems get updated.
Even when you go to a particular folder containing Map Documents in Catalog and mistakenly click on an mxd the whole application grinds to an inevitable halt!
So no option but to open Task Manager and kill ArcMAP process or go make a cup of tea!

Would be great if you could enable a background search set to go overnight that will then say "grey out" all map documents with dodgy datasources?

I don't know that I agree with ALL of the points above, but I know my ArcMap can take about a half hour to open an old document that has references to an SDE server that doesn't exist any more.  Number of layers: about 5.  Once it DOES come up, I'll right-click a layer and set its data connection, which will fix its siblings' connections.  But trying to fix this many old map documents to point to the new data is a headache and almost a waste of time.

I think a clean solution would be to program ArcGIS Desktop so that it gives up looking for data a LOT sooner than it currently does.  If it can't find a data source in, like 7 seconds, stop trying.  And any other layers looking for the same data source needn't even try.  Does that sound better than a global data source management tool?  Just have it give up sooner.