Select to view content in your preferred language

Best practices with using domains?

3839
14
Jump to solution
02-08-2024 05:20 AM
Labels (2)
MDB_GIS
Frequent Contributor

Is there any reason not to use the fully qualified attribute in the 'Code' column as well as the 'Description' column? I have been using numeric codes for a  while now but they are sometimes a bit annoying to deal with. Specifically if I ever export the data. I'm contemplating just converting the field to a text field and using the fully spelled out attribute for both columns. Are there any real reasons not to do this?

1 Solution

Accepted Solutions
MErikReedAugusta
MVP Regular Contributor

We have sort of a hybrid between the two approaches.

I abhor the numeric-code-with-text-alias approach to Domains.  I get why it was used in the past, but as others have said: Data storage and throughput capacities are at a point these days where that has increasingly weakened utility, unless you're working with absurdly gargantuan datasets and long, complex calculations.

I interface with non-GIS people all the time, which often means spreadsheets are also going to get involved, and there's too much chance for error with the number/text combo approach.

 

On the flip side, though, there are plenty of values that are poor practice to allow into the raw data of your attribute table.  For example, in our stormwater data, we had a condition domain that had as an option "Clogged > 20%".

That greater-than sign breaks all kinds of operations and generally just makes life needlessly difficult.  So now that greater-than sign exists only in the Description/Alias.  The raw code beneath just says "Clogged".  (It was the only clogged value we had, otherwise it would've been "Clogged_20")

 

Other characters recommended to avoid in your raw code value, if you can afford to do so:

  • Ampersand (&)
  • Percent sign (%)
  • Less than & Greater than (<), (>)
  • Parentheses (()), Brackets ([]), or Curly Braces ({})
  • Double quotes (") and the angled/curly variants thereof
  • Single quotes (') and the angled/curly variants thereof
  • Spaces (this actually isn't that much of an issue.  But some operations might be ever-so-slightly easier without them.  This is a pretty small list, though, so grain of salt on excluding this one.  We do typically replace our spaces with underscores for consistency & clarity, though.)

I also tend to avoid special characters that aren't on a standard keyboard, like the degree sign (°).  They're not likely to cause problems with most operations, but on the rare case that someone's running a field calculation or something similar where they need the raw value, it just makes it a little bit easier for them to write & run that code.  I also used to do this for unicode/plain text reasons in Python, but that's not really an issue anymore, since Python 3 switched to a baseline assumption of unicode.

It also helps avoid confusion in cases where there are multiple similar characters that could be easily confused, like the the degree sign (°) compared to the masculine ordinal (º).  They're different characters with different purposes and code points, but to the average end user, they look identical in most fonts.

That's a situation ripe for errors.

------------------------------
M Reed
"The pessimist may be right oftener than the optimist, but the optimist has more fun, and neither can stop the march of events anyhow." — Robert A. Heinlein, in Time Enough for Love

View solution in original post

14 Replies
ZachBodenner
MVP Regular Contributor

I honestly get very turned around when the domain code is different from what you see in the attribute cell. I can imagine a case where you have qualitative data that's ordered in some way (likert-scale-style) where one value actually does fit in as "1" and on down the list, but I nearly always have the code and description match.

Happy mapping,
- Zach
clt_cabq
Frequent Contributor

once upon a time because of storage and data processing limitations codes either numeric or shortened/abbreviated values were needed but these days there is almost no reason for implementing this kind approach. I start to get anxious about long cumbersome descriptive terms, and sometimes processing a short uninterrupted string of values is easier than working with text that includes spaces, characters like hyphens and such. As long as you are considering how you process and display the data - for instance if there is some sort of ordering inherent in your data then you may want codes that reflect this so that you can sort or rank data appropriately, by and large you should make your codes mostly human readable.

ZachBodenner
MVP Regular Contributor

I'll add on another thought in regards to anxiousness about long descriptions and special characters: i recently had a natural resources professional ask for a new domain for tree species names that included the botanical name as typically noted, so for example, the full Description of the domain would be "Red Oak (Quercus rubra)". My first thought was to have Code of the domain be just the common tree name (Red Oak in that case) and then have the code be the full botanical name. Ultimately I decided against it because even though botanical names can get cumbersome, especially with spelling, I foresaw the NR staff entering tree species names via the field calculator, and if they didn't use the options within that tool to ensure domain usage, there could easily be mis-entered data. So, I went against my personal better judgement and actually used the whole botanical name for both code and description (plus, some times a tree's common name can be the same even if they're different species, but that's a whole other issue!).

Happy mapping,
- Zach
MErikReedAugusta
MVP Regular Contributor

We have sort of a hybrid between the two approaches.

I abhor the numeric-code-with-text-alias approach to Domains.  I get why it was used in the past, but as others have said: Data storage and throughput capacities are at a point these days where that has increasingly weakened utility, unless you're working with absurdly gargantuan datasets and long, complex calculations.

I interface with non-GIS people all the time, which often means spreadsheets are also going to get involved, and there's too much chance for error with the number/text combo approach.

 

On the flip side, though, there are plenty of values that are poor practice to allow into the raw data of your attribute table.  For example, in our stormwater data, we had a condition domain that had as an option "Clogged > 20%".

That greater-than sign breaks all kinds of operations and generally just makes life needlessly difficult.  So now that greater-than sign exists only in the Description/Alias.  The raw code beneath just says "Clogged".  (It was the only clogged value we had, otherwise it would've been "Clogged_20")

 

Other characters recommended to avoid in your raw code value, if you can afford to do so:

  • Ampersand (&)
  • Percent sign (%)
  • Less than & Greater than (<), (>)
  • Parentheses (()), Brackets ([]), or Curly Braces ({})
  • Double quotes (") and the angled/curly variants thereof
  • Single quotes (') and the angled/curly variants thereof
  • Spaces (this actually isn't that much of an issue.  But some operations might be ever-so-slightly easier without them.  This is a pretty small list, though, so grain of salt on excluding this one.  We do typically replace our spaces with underscores for consistency & clarity, though.)

I also tend to avoid special characters that aren't on a standard keyboard, like the degree sign (°).  They're not likely to cause problems with most operations, but on the rare case that someone's running a field calculation or something similar where they need the raw value, it just makes it a little bit easier for them to write & run that code.  I also used to do this for unicode/plain text reasons in Python, but that's not really an issue anymore, since Python 3 switched to a baseline assumption of unicode.

It also helps avoid confusion in cases where there are multiple similar characters that could be easily confused, like the the degree sign (°) compared to the masculine ordinal (º).  They're different characters with different purposes and code points, but to the average end user, they look identical in most fonts.

That's a situation ripe for errors.

------------------------------
M Reed
"The pessimist may be right oftener than the optimist, but the optimist has more fun, and neither can stop the march of events anyhow." — Robert A. Heinlein, in Time Enough for Love
MDB_GIS
Frequent Contributor

Very thorough and helpful answer. Thank you!

0 Kudos
fklotz
by
Frequent Contributor

@MErikReedAugusta Thanks for this thoughtful answer. I may have just been lucky so far or haven't done certain processes against the data where special characters were used in the domain codes but I don't recall running into any issues using special characters in Esri coded value domains in file geodatabases, SQL Server based enterprise geodatabases and ArcGIS Online. We don't often use special characters but have not always removed them and I may not be doing the things you have done to result in issues. I'm working for a municipality that has a standard in their enterprise GIS databases to have the code == value in their coded value domains, therefore I'm very curious if you could provide any specifics on this statement you made: "That greater-than sign breaks all kinds of operations and generally just makes life needlessly difficult." I'd like to use that information to limit any future issues we may have and maybe inform the municipality to get the standard modified.

Farley
0 Kudos
MErikReedAugusta
MVP Regular Contributor

I haven't encountered too many issues in SQL that I can remember, but they do occur with ampersand and less-than / left-angle-bracket in places like labels & text, for one.  See my recent ideas post for more on those.

The single quote and percent sign are both reserved characters in SQL, so that could potentially cause you unnecessary headaches in your queries.

Other places you could theoretically encounter issue are automations & integrations outside of the Arc walled garden, because many of those characters are traditionally reserved characters in various coding languages.  The quotes will be especially problematic, since any value that contains these characters can be presumed to be a text value, and nearly all coding languages use single and/or double quote to indicate a text string.

I don't remember the specific problem we were having with "Clogged > 20%", but it could theoretically have been the greater-than or the percent sign that triggered it.  I do remember having to always account for it when I scripted things, for some reason, though.  It got tedious, and gained us nothing all that valuable, so we changed it.

------------------------------
M Reed
"The pessimist may be right oftener than the optimist, but the optimist has more fun, and neither can stop the march of events anyhow." — Robert A. Heinlein, in Time Enough for Love
0 Kudos
jcarlson
MVP Esteemed Contributor

Agreed, codes sound nice at the outset but become a standard GIS joke. "What is a '13' in this layer, again?"

Like @ZachBodenner 's hypothetical, we do have some surveys where the question responses are things like "strongly disagree" and that sort of thing, but which we want coded to a numeric value for scoring purposes. But otherwise, I like to have the codes more descriptive if I can.

I will say, though, that we do not use the same exact value on both sides of the domain. Sometimes I have a displayed value like "no action required at this time". When I'm working with that same layer elsewhere, particularly when I'm using some expression or coding language like Arcade and Python, long values with spaces in them get very tedious and sometimes problematic to work with.

I try my best to keep the codes concise, but obvious in their meaning, so that they can be used as-is if needed. This will often be a common abbreviation, or a short version of the label. "not_needed", "processed", things like that.

- Josh Carlson
Kendall County GIS
ZachBodenner
MVP Regular Contributor

That's a good point @jcarlson . I think the upshot is that, as with all data construction things, the end-user needs to be considered first and foremost. Are you the only one who manipulates this data? If so, what works best in your mind? Will you get turned around by codes that like the ones described in jcarlson's example, or does that make sense for your mind? Will anyone else be doing edits, especially in batch? How important is very consistent data entry to you? There's a lot to consider.

Happy mapping,
- Zach