Skip navigation
All People > tpcolson > Finding Sasquatch with ArcGIS Pro

If you've used any component of ESRI enterprise technology, you've encountered the ubiquitous Global ID unique identifier.

 

Probably by right-clicking on a feature class and select "Add Global ID's" and forgetting about it.

 

This post is more of a rant and rave against random GUID's than anything else. You certainly need a GUID column if you're replicating data, or performing disconnected edits.

 

I just absolutely hate them! My problem with the Global ID column is how ESRI implements it. Not only is it unique (as it should be), it's random. I use SDE/SQL data for far more than showing dots on a map. We link to that data with other applications, such as those that manage the non-GIS data that is attached to that dot on the map. In some applications I have dozens of relationships and tables hanging off of that "GIS table". All accomplished with crafty use of the Global ID column as a way to relate non-GIS data to the GIS data. Sure, I could use something else, but then I wouldn't have such an attention-getting blog post title.

 

My problem with the Global ID column is its randomness. Imagine a feature class with 10,000 (I have a few with 500,000) points. That's 10,000 random GUID's. Now link to that table using that column, or sort it, or filter it. Throw a clustered (or non-clustered) index on the Global ID column. Those random GUID's sure slow things down, don't they?

 

Let's see how this works. When you do that right-clicking and forgetting, SDE adds a constraint to the Global ID field:

 

ALTER TABLE [dbo].[TEST] ADD  DEFAULT ('{00000000-0000-0000-0000-000000000000}') FOR [GlobalID]

 

Every new feature added to that feature class calls a stored procedure which takes a SQL newid (GUID value) and converts it to a string.

 

CREATE PROCEDURE [dbo].[next_globalid]
@guid NVARCHAR(38) OUTPUT
AS SET NOCOUNT ON
BEGIN
SELECT @guid =  '{' + CAST (NEWID() AS NVARCHAR(38)) + '}'
END
GO

 

But what if we could make the GUID sequential and not random, but still unique? Such a thing exists, and a search of the internet will reveal to you a very heated debate with many opinions firmly rooted in the for or against sequential GUID's. Not for here....

 

Trying to alter the SP to default to the SQL newSEQUENTIALID doesn't work...

 

CREATE PROCEDURE [dbo].[next_globalid]
@guid NVARCHAR(38) OUTPUT
AS SET NOCOUNT ON
BEGIN
SELECT @guid =  '{' + CAST (NEWSEQUENTIALID() AS NVARCHAR(38)) + '}'
END
GO
The newsequentialid() built-in function can only be used in a DEFAULT expression for a column of type 'uniqueidentifier' in a CREATE TABLE or ALTER TABLE statement. It cannot be combined with other operators to form a complex scalar expression.

So here's my work around: Put a trigger in between the delta and the base table and ignore the ESRI SP that writes the random GUID. We can do this with an instead of insert trigger and changing the default of the Global ID column.

 

First:

 

ALTER TABLE [dbo].[TEST] DROP CONSTRAINT [DF__TEST__GlobalID__65FFA925]
GO
ALTER TABLE [dbo].[TEST] ADD  CONSTRAINT [DF__TEST__GlobalID__65FFA925]  DEFAULT (newsequentialid()) FOR [GlobalID]
GO

 

Then:

 

CREATE TRIGGER [dbo].[TEST_GLOBAL_ID]

ON [dbo].[TEST]
INSTEAD OF INSERT NOT FOR REPLICATION
AS BEGIN
SET NOCOUNT ON; 

INSERT [TEST](
OBJECTID, SOMEFIELD, SHAPE, LON, LAT, QuadName, Watershed, County, State, PARKDISTRICT, ELEVATION, StreamName, RiverOrder
)
SELECT 
OBJECTID, a.SOMEFIELD, a.SHAPE, a.LON, a.LAT, a.QuadName, a.Watershed, a.County, a.State, a.PARKDISTRICT, a.ELEVATION, a.StreamName, a.RiverOrder
From
(SELECT 
OBJECTID, SOMEFIELD, SHAPE, LON, LAT, QuadName, Watershed, County, State, PARKDISTRICT, ELEVATION, StreamName, RiverOrder
FROM inserted) 
AS a 
;
end
GO

Add a couple of new points and

 

SELECT 
[GlobalID]
FROM [FISH].[dbo].[TEST]

 

 

GlobalID
C10EB116-8B14-45AB-924D-0C734E5AB5B6
61AE23FA-F02D-45C1-991D-571B77592014
0695789D-35A7-4BE4-B5F6-5EAF68D2A50B
5A20B628-6048-4D48-8380-AC005A0E70EC
CF52E6DE-5F60-456E-9DEF-C006D9BBD348
58F80A07-F8A8-4D62-BBB3-D012EA781F0C
5E7B9C91-2891-E411-B57C-E41F134196DA
BE30E498-2891-E411-B57C-E41F134196DA
BF30E498-2891-E411-B57C-E41F134196DA
C030E498-2891-E411-B57C-E41F134196DA
C130E498-2891-E411-B57C-E41F134196DA
38C5A6F2-60FF-4C39-BF37-F7AFCBDFDE90

 

Note how my new points (starting at line 08 and ending at 12) contain sequential ID's. I'm using an instead of versus an after update trigger here because I want this to occur BEFORE it hits the base table.

 

Unfortunately, this "alteration" is not very robust and can't be used if your data is versioned WITHOUT moving edits to base, or not versioned at all (as you need when using Collector for ArcGIS). Also not discussed here is my very compelling reason for using the Global ID field in some of the non-GIS applications that use these data. Let's chalk this up to the "tinkering" category. ESRI I'm sure has a very good reason for implementing the Global ID default like they do, and this post surely will foster some heated debate and comments.

 

This, added with the after update/insert trigger presented in Hey Neighbor? What's Your Value?  will result in a nice, neat feature class where almost all of the attributes are automatically populated and your Global ID's are ordered.

 

 

 

This is a personal blog and does not recommend, endorse, or support the methods described above. Alteration of data using SQL outside of the ESRI software stack, of course, is not supported and should not be applied to a production database without a thorough understanding and disaster recovery plan.

In Rasters, Pixels and Code Oh My! I extended an automation trigger to populate an attribute field based on its location within a raster pixel. But I still want more data automation. I want it all!


Not only do we want to know some administrative attributes of new data points (what county, state is it in?: Spatial Intersect), how high it is, but in an era of not-so-precise data points, what is nearest to this point?

 

In this particular application, a water quality monitoring dataset, I also want to know the name of the stream this point occurs on, as well as the stream order. I'm using NHD Stream Lines which have been run through Arc Hydro to add Strahler Stream Order as an attribute.

 

Using a Nearest Neighbor Query I can accomplish that, albeit with a little more complexity because now I'm asking SQL to look for the "Nearest" feature, and not a simple intersect of two geography objects. The code block starts on line 48.

 

 

 

ALTER TRIGGER [dbo].[TEST_GEOGRAPHY]
ON [dbo].[TEST]
/****** fire on inserts and updates  ******/
/****** disable trigger when SQL replication or mirroring is enabled  ******/
after INSERT,UPDATE NOT FOR REPLICATION
AS   
BEGIN    
  SET NOCOUNT ON;  

  UPDATE p SET  
/****** hypothetically we could enter the lat/lon as text and create a geography object  ******/
         SHAPE = CASE WHEN i.SHAPE IS NOT NULL   
        THEN p.SHAPE ELSE Geography::STPointFromText('POINT('  
          + CAST(p.LON AS VARCHAR(20)) + ' '   
          + CAST(p.LAT AS VARCHAR(20)) + ')', 4269) END,  
/****** usual case point is created with ARC and casts the LAT/LON as text  ******/
/****** from the geography object  ******/
      LON = CASE WHEN p.SHAPE IS NULL THEN p.LON ELSE p.SHAPE.Long END,  
      LAT = CASE WHEN p.SHAPE IS NULL THEN p.LAT ELSE p.SHAPE.Lat END,
       QuadName = COALESCE(b.name, p.QuadName),
    Watershed = COALESCE(c.HUC_12_Name, p.Watershed),
    County = COALESCE(d.Name, p.County), 
    State= COALESCE(e.Name, p.State),
    PARKDISTRICT = COALESCE(f.District, p.PARKDISTRICT),
  ELEVATION = (SELECT
pdata.getValueByLoc(1,p.SHAPE.Long,p.SHAPE.Lat)  FROM [dbo].[DEM10MP]),
  StreamName = COALESCE(g.GNIS_Name, p.StreamName),
  RiverOrder = COALESCE(h.RiverOrder, p.RiverOrder)
  FROM  TEST
     AS p 
/****** allow upate of lat/lon on update ******/
  INNER JOIN  
    inserted AS i 
    ON i.globalid = p.globalid 
   LEFT OUTER JOIN USGS_24K_TOPOMAP_BOUNDARIES AS b 
    ON b.Shape.STIntersects(i.Shape) = 1 
      LEFT OUTER JOIN WATERSHEDS AS c
    ON c.Shape.STIntersects(i.Shape) = 1 
          LEFT OUTER JOIN GRSM_COUNTIES AS d
    ON d.Shape.STIntersects(i.Shape) = 1 
              LEFT OUTER JOIN GRSM_States AS e
    ON e.Shape.STIntersects(i.Shape) = 1
    LEFT OUTER JOIN GRSM_SUB_RANGER_DISTRICTS AS f 
ON f.Shape.STIntersects(i.Shape) = 1 


/****** Nearest Neighbor ******/
CROSS APPLY (SELECT TOP 1 GNIS_Name, shape                   
FROM dbo.NHDFLOWLINE 
/****** force spatial index hint ******/
WITH(index ([S208_idx]))                  
WHERE NHDFLOWLINE.Shape.STDistance(i.Shape) IS NOT NULL
                  ORDER BY NHDFLOWLINE.Shape.STDistance(i.Shape) ASC) as g
CROSS APPLY (SELECT TOP 1 RiverOrder, shape                   
FROM dbo.NHDFLOWLINE 
/****** force spatial index hint ******/
WITH(index ([S208_idx]))                  
WHERE NHDFLOWLINE.Shape.STDistance(i.Shape) IS NOT NULL
                  ORDER BY NHDFLOWLINE.Shape.STDistance(i.Shape) ASC) as h
  ; 
END
GO

 

Note the use of an index hint here. In SQL 2008 use of a the spatial index is not honored, and in fact, doesn't work without a lot of tweaking. Here we're telling it to find the name of the nearest stream and its order, but use the spatial index of the NHD Flowline feature class to optimize the query. Without the index hint this query takes for ever!

 

If the stream name is null (as many are) the crossapply and order by statements stop the search at the nearest stream instead of looking for the nearest named stream.

 

I've found this works best when I enforce some data entry rules: add a new point only using snapping (to the NHD layer in the map interface). There is the danger that someone could erroneously add a new point that is "somewhat" between two streams, and the wrong name and order could get attributed to the point.

 

In this particular application, we are using it in conjunction with the USGS HEM tools, which handles the snapping, assignment, and population of reachcode to the point data, so we're not to worried about the wrong-point-in-the-wrong-place scenario.

 

This is a personal blog and does not recommend, endorse, or support the methods described above. Alteration of data using SQL outside of the ESRI software stack, of course, is not supported and should not be applied to a production database without a thorough understanding and disaster recovery plan.

In Repetitive Attribute Values Automated with SQL  we added some automation to repetitive data entry using feature classes that are using the Microsoft Geography storage type.

 

But what about other repetitive data points? Back to our "Data Standard", much of the data we collect also needs to be attributed with discrete or continuous values obtained from a raster, such as elevation. Sure, I could bang out some Python or C# code to handle this, but I skipped that class...

 

This assumes that you have a raster Digital Elevation Model in SDE that has been FIRST configured using ST_RASTER (ArcGIS Help 10.1 ) then copied over into a ST_PIXEL table with SRID 4269.

 

In this example, I have a 10-m resolution DEM named DEM10MP. In my naming convention, the "P" indicates that the table is of ST_PIXEL type.

 

Extending the trigger presented in Repetitive Attribute Values Automated with SQL  I simply add a Get Value By Location query using the LAT/LON value from the inserted or updated Geography column. The entire trigger looks like this. Line 25/26 includes the elevation query.

 

ALTER TRIGGER [dbo].[TEST_GEOGRAPHY]
ON [dbo].[TEST]
/****** fire on inserts and updates  ******/
/****** disable trigger when SQL replication or mirroring is enabled  ******/
after INSERT,UPDATE NOT FOR REPLICATION
AS   
BEGIN    
  SET NOCOUNT ON;  

  UPDATE p SET  
/****** hypothetically we could enter the lat/lon as text and create a geography object  ******/
         SHAPE = CASE WHEN i.SHAPE IS NOT NULL   
        THEN p.SHAPE ELSE Geography::STPointFromText('POINT('  
          + CAST(p.LON AS VARCHAR(20)) + ' '   
          + CAST(p.LAT AS VARCHAR(20)) + ')', 4269) END,  
/****** usual case point is created with ARC and casts the LAT/LON as text  ******/
/****** from the geography object  ******/
      LON = CASE WHEN p.SHAPE IS NULL THEN p.LON ELSE p.SHAPE.Long END,  
      LAT = CASE WHEN p.SHAPE IS NULL THEN p.LAT ELSE p.SHAPE.Lat END,
       QuadName = COALESCE(b.name, p.QuadName),
    Watershed = COALESCE(c.HUC_12_Name, p.Watershed),
    County = COALESCE(d.Name, p.County), 
    State= COALESCE(e.Name, p.State),
    PARKDISTRICT = COALESCE(f.District, p.PARKDISTRICT),
  ELEVATION = (SELECT
pdata.getValueByLoc(1,p.SHAPE.Long,p.SHAPE.Lat)  FROM [dbo].[DEM10MP])


  FROM  TEST
     AS p 
/****** allow upate of lat/lon on update ******/
  INNER JOIN  
    inserted AS i 
    ON i.globalid = p.globalid 
   LEFT OUTER JOIN USGS_24K_TOPOMAP_BOUNDARIES AS b 
    ON b.Shape.STIntersects(i.Shape) = 1 
      LEFT OUTER JOIN WATERSHEDS AS c
    ON c.Shape.STIntersects(i.Shape) = 1 
          LEFT OUTER JOIN GRSM_COUNTIES AS d
    ON d.Shape.STIntersects(i.Shape) = 1 
              LEFT OUTER JOIN GRSM_States AS e
    ON e.Shape.STIntersects(i.Shape) = 1
    LEFT OUTER JOIN GRSM_SUB_RANGER_DISTRICTS AS f 
ON f.Shape.STIntersects(i.Shape) = 1 
  ; 
END
GO

 

Now for every new point I add to this feature class, or edit, all but one field is automatically populated or updated.

 

Every time.

 

No user input.

 

rewr.JPG

 

This is a personal blog and does not recommend, endorse, or support the methods described above. Alteration of data using SQL outside of the ESRI software stack, of course, is not supported and should not be applied to a production database without a thorough understanding and disaster recovery plan.

Have you every been exceedingly annoyed by attribute domain values that aren't alphabetized or in some specific order? One could argue that it's your fault for not creating the domain values in the right order when you made it but.....

 

There IS a way you can update the domain values to render to end users in any way you want!

 

Lets say you have a domain titled "District". To see what it looks like in XML:

 

Select DEFINITION FROM [dbo].[GDB_ITEMS]
WHERE NAME = 'District'

 

And click on the value in the definition field in the results.

 

 

<GPCodedValueDomain2 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:typens="http://www.esri.com/schemas/ArcGIS/10.0" xsi:type="typens:GPCodedValueDomain2">
  <DomainName>District</DomainName>
  <FieldType>esriFieldTypeString</FieldType>
  <MergePolicy>esriMPTDefaultValue</MergePolicy>
  <SplitPolicy>esriSPTDefaultValue</SplitPolicy>
  <Description>Desc</Description>
  <Owner>DBO</Owner>
  <CodedValues xsi:type="typens:ArrayOfCodedValue">
    <CodedValue xsi:type="typens:CodedValue">
      <Name>North District</Name>
      <Code xsi:type="xs:string">ND</Code>
    </CodedValue>
    <CodedValue xsi:type="typens:CodedValue">
      <Name>South District</Name>
      <Code xsi:type="xs:string">SD</Code>
    </CodedValue>
    <CodedValue xsi:type="typens:CodedValue">
      <Name>Cades Cove District</Name>
      <Code xsi:type="xs:string">CC</Code>
    </CodedValue>
  </CodedValues>
</GPCodedValueDomain2>

What I want to see is the values and codes sorted alphabetically, which they are not. First, a little Excel magic. Using the Domain to Table tool, I've exported the Domain out to an Excel Spreadsheet. Then performed my sort.

 

dddd.JPG

 

Then some field calculations. Here I'm using plain text for the XML element and cell references for the CODE and VALUE.

 

="<CodedValue xsi:type=""typens:CodedValue""><Name>"&B2&"</Name><Code xsi:type=""xs:string"">"&A2&"</Code></CodedValue>"

 

Note how the formula is all one line with no carriage returns. Otherwise when you paste the result, quotes will be added where you don't want them!

 

eeeee.JPG

 

Now update the XML definition by pasting the Excel formula results to replace the XML elements (ONLY!):

 

UPDATE [dbo].[GDB_ITEMS]
SET DEFINITION = 
'
<GPCodedValueDomain2 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:typens="http://www.esri.com/schemas/ArcGIS/10.0" xsi:type="typens:GPCodedValueDomain2">
  <DomainName>District</DomainName>
  <FieldType>esriFieldTypeString</FieldType>
  <MergePolicy>esriMPTDefaultValue</MergePolicy>
  <SplitPolicy>esriSPTDefaultValue</SplitPolicy>
  <Description>Desc</Description>
  <Owner>DBO</Owner>
  <CodedValues xsi:type="typens:ArrayOfCodedValue">
<CodedValue xsi:type="typens:CodedValue"><Name>Cades Cove District</Name><Code xsi:type="xs:string">CC</Code></CodedValue>
<CodedValue xsi:type="typens:CodedValue"><Name>North District</Name><Code xsi:type="xs:string">ND</Code></CodedValue>
<CodedValue xsi:type="typens:CodedValue"><Name>South District</Name><Code xsi:type="xs:string">SD</Code></CodedValue>
  </CodedValues>
</GPCodedValueDomain2>
'
WHERE NAME = 'District'

fffff.JPGI have found this very handy for large domains that have dozens or hundreds of domain values that I either wanted sorted alphabetically or where I needed to make some sort of global update such as converting the text in "Description" to Camel Case.

 

Combining Excel with an advanced text editor such as Notepad ++ with the TextFX tools plugin can result is some very efficient management of attribute domains!

 

This is a personal blog and does not recommend, endorse, or support the methods described above. Alteration of data using SQL outside of the ESRI software stack, of course, is not supported and should not be applied to a production database without a thorough understanding and disaster recovery plan.

Most (all actually) of my GIS data has what I call the "Mickey Mouse" attribute value requirement: Each feature class has a couple of attribute fields that contain boring information such as administrative boundaries (what state, county this point occurs in). In the past I've been very proud of the elaborate attribute domains I've created so data editors can choose what (County, State, USGS Topo Quad) their data point occurs in.

 

No one uses them.

 

Data entry has a time and a labor cost, and like most of us, we need to focus on what's important (what happened at this point) and don't have a lot of time for administrative data entry.

 

Until a report is due that requires the partitioning of data values by (state, county, or watershed).

 

In Using SQL to Auto-Populate XY Values I described using a Trigger to automate the population of GPS coordinates using the well-known text pulled from the geography column. There's no reason why we can't extend that to handling a few more data maintenance tasks.

 

In this example, I have the following administrative boundary polygon feature classes all configure with the "Geography" storage type and SRID 4629 in the same SDE database.

  1. GRSM_COUNTIES: County name is populated in the County field;
  2. GRSM_States: State name is populated in the State field;
  3. GRSM_Sub_Ranger_Districts: Ranger District name is populated in the PARKDISTRICT field;
  4. USGS_24K_TOPOMAP_BOUNDARIES: USGS Quad Name is populated in the QuadName field;
  5. WATERSHEDS: HUC 12 Watershed Name is populated in the Watershed field.

Our data policy is that each new data point will be attributed with the "Name" value of each of the 5 polygons it occurs in. Imagine the enthusiasm of data editors when told they have to choose one of many values from a pick list, 5 times, for each new point they add or edit!

 

Extending Using SQL to Auto-Populate XY Values we can "join" the new or updated point spatially with each of the 5 polygons it occurs in and grab the required value from the polygon and write it to the correct attribute field.

 

ALTER TRIGGER [dbo].[TEST_GEOGRAPHY]
ON [dbo].[TEST]
/****** fire on inserts and updates  ******/
/****** disable trigger when SQL replication or mirroring is enabled  ******/
after INSERT,UPDATE NOT FOR REPLICATION
AS   
BEGIN    
  SET NOCOUNT ON;  

  UPDATE p SET  
/****** hypothetically we could enter the lat/lon as text and create a geography object  ******/
         SHAPE = CASE WHEN i.SHAPE IS NOT NULL   
        THEN p.SHAPE ELSE Geography::STPointFromText('POINT('  
          + CAST(p.LON AS VARCHAR(20)) + ' '   
          + CAST(p.LAT AS VARCHAR(20)) + ')', 4269) END,  
/****** usual case point is created with ARC and casts the LAT/LON as text  ******/
/****** from the geography object  ******/
      LON = CASE WHEN p.SHAPE IS NULL THEN p.LON ELSE p.SHAPE.Long END,  
      LAT = CASE WHEN p.SHAPE IS NULL THEN p.LAT ELSE p.SHAPE.Lat END,
       QuadName = COALESCE(b.name, p.QuadName),
    Watershed = COALESCE(c.HUC_12_Name, p.Watershed),
    County = COALESCE(d.Name, p.County), 
    State= COALESCE(e.Name, p.State),
    PARKDISTRICT = COALESCE(f.District, p.PARKDISTRICT)
  FROM  TEST
     AS p 
/****** allow upate of lat/lon on update ******/
  INNER JOIN  
    inserted AS i 
    ON i.globalid = p.globalid 
   LEFT OUTER JOIN USGS_24K_TOPOMAP_BOUNDARIES AS b 
    ON b.Shape.STIntersects(i.Shape) = 1 
      LEFT OUTER JOIN WATERSHEDS AS c
    ON c.Shape.STIntersects(i.Shape) = 1 
          LEFT OUTER JOIN GRSM_COUNTIES AS d
    ON d.Shape.STIntersects(i.Shape) = 1 
              LEFT OUTER JOIN GRSM_States AS e
    ON e.Shape.STIntersects(i.Shape) = 1
    LEFT OUTER JOIN GRSM_SUB_RANGER_DISTRICTS AS f 
ON f.Shape.STIntersects(i.Shape) = 1 
  ; 
END
GO

 

There is no limit to the number of joins you can use to automate spatial queries that update attribute values. Just remember that both feature classes participating in the join need to use the same storage type.

 

But what if you want to apply this to existing data? After creating the trigger you could apply a one-time query to update existing values. This example is using the Geometry storage type:

 

 

use GRSM
update GRSM_TRAILS
set CONG_DIST = CONGRESSIONALDISTRICTS.NAMELSAD
from GRSM_TRAILS
inner join CONGRESSIONALDISTRICTS
on grsm_trails.location_id = grsm_trails.location_id
where (CONGRESSIONALDISTRICTS.Shape.STIntersection(grsm_trails.Shape).STDimension() = 1);

 

 

This is a personal blog and does not recommend, endorse, or support the methods described above. Alteration of data using SQL outside of the ESRI software stack, of course, is not supported and should not be applied to a production database without a thorough understanding and disaster recovery plan.