Is there any reason not to use the fully qualified attribute in the 'Code' column as well as the 'Description' column? I have been using numeric codes for a while now but they are sometimes a bit annoying to deal with. Specifically if I ever export the data. I'm contemplating just converting the field to a text field and using the fully spelled out attribute for both columns. Are there any real reasons not to do this?
Solved! Go to Solution.
We have sort of a hybrid between the two approaches.
I abhor the numeric-code-with-text-alias approach to Domains. I get why it was used in the past, but as others have said: Data storage and throughput capacities are at a point these days where that has increasingly weakened utility, unless you're working with absurdly gargantuan datasets and long, complex calculations.
I interface with non-GIS people all the time, which often means spreadsheets are also going to get involved, and there's too much chance for error with the number/text combo approach.
On the flip side, though, there are plenty of values that are poor practice to allow into the raw data of your attribute table. For example, in our stormwater data, we had a condition domain that had as an option "Clogged > 20%".
That greater-than sign breaks all kinds of operations and generally just makes life needlessly difficult. So now that greater-than sign exists only in the Description/Alias. The raw code beneath just says "Clogged". (It was the only clogged value we had, otherwise it would've been "Clogged_20")
Other characters recommended to avoid in your raw code value, if you can afford to do so:
I also tend to avoid special characters that aren't on a standard keyboard, like the degree sign (°). They're not likely to cause problems with most operations, but on the rare case that someone's running a field calculation or something similar where they need the raw value, it just makes it a little bit easier for them to write & run that code. I also used to do this for unicode/plain text reasons in Python, but that's not really an issue anymore, since Python 3 switched to a baseline assumption of unicode.
It also helps avoid confusion in cases where there are multiple similar characters that could be easily confused, like the the degree sign (°) compared to the masculine ordinal (º). They're different characters with different purposes and code points, but to the average end user, they look identical in most fonts.
That's a situation ripe for errors.
I honestly get very turned around when the domain code is different from what you see in the attribute cell. I can imagine a case where you have qualitative data that's ordered in some way (likert-scale-style) where one value actually does fit in as "1" and on down the list, but I nearly always have the code and description match.
once upon a time because of storage and data processing limitations codes either numeric or shortened/abbreviated values were needed but these days there is almost no reason for implementing this kind approach. I start to get anxious about long cumbersome descriptive terms, and sometimes processing a short uninterrupted string of values is easier than working with text that includes spaces, characters like hyphens and such. As long as you are considering how you process and display the data - for instance if there is some sort of ordering inherent in your data then you may want codes that reflect this so that you can sort or rank data appropriately, by and large you should make your codes mostly human readable.
I'll add on another thought in regards to anxiousness about long descriptions and special characters: i recently had a natural resources professional ask for a new domain for tree species names that included the botanical name as typically noted, so for example, the full Description of the domain would be "Red Oak (Quercus rubra)". My first thought was to have Code of the domain be just the common tree name (Red Oak in that case) and then have the code be the full botanical name. Ultimately I decided against it because even though botanical names can get cumbersome, especially with spelling, I foresaw the NR staff entering tree species names via the field calculator, and if they didn't use the options within that tool to ensure domain usage, there could easily be mis-entered data. So, I went against my personal better judgement and actually used the whole botanical name for both code and description (plus, some times a tree's common name can be the same even if they're different species, but that's a whole other issue!).
We have sort of a hybrid between the two approaches.
I abhor the numeric-code-with-text-alias approach to Domains. I get why it was used in the past, but as others have said: Data storage and throughput capacities are at a point these days where that has increasingly weakened utility, unless you're working with absurdly gargantuan datasets and long, complex calculations.
I interface with non-GIS people all the time, which often means spreadsheets are also going to get involved, and there's too much chance for error with the number/text combo approach.
On the flip side, though, there are plenty of values that are poor practice to allow into the raw data of your attribute table. For example, in our stormwater data, we had a condition domain that had as an option "Clogged > 20%".
That greater-than sign breaks all kinds of operations and generally just makes life needlessly difficult. So now that greater-than sign exists only in the Description/Alias. The raw code beneath just says "Clogged". (It was the only clogged value we had, otherwise it would've been "Clogged_20")
Other characters recommended to avoid in your raw code value, if you can afford to do so:
I also tend to avoid special characters that aren't on a standard keyboard, like the degree sign (°). They're not likely to cause problems with most operations, but on the rare case that someone's running a field calculation or something similar where they need the raw value, it just makes it a little bit easier for them to write & run that code. I also used to do this for unicode/plain text reasons in Python, but that's not really an issue anymore, since Python 3 switched to a baseline assumption of unicode.
It also helps avoid confusion in cases where there are multiple similar characters that could be easily confused, like the the degree sign (°) compared to the masculine ordinal (º). They're different characters with different purposes and code points, but to the average end user, they look identical in most fonts.
That's a situation ripe for errors.
Very thorough and helpful answer. Thank you!
Agreed, codes sound nice at the outset but become a standard GIS joke. "What is a '13' in this layer, again?"
Like @ZachBodenner 's hypothetical, we do have some surveys where the question responses are things like "strongly disagree" and that sort of thing, but which we want coded to a numeric value for scoring purposes. But otherwise, I like to have the codes more descriptive if I can.
I will say, though, that we do not use the same exact value on both sides of the domain. Sometimes I have a displayed value like "no action required at this time". When I'm working with that same layer elsewhere, particularly when I'm using some expression or coding language like Arcade and Python, long values with spaces in them get very tedious and sometimes problematic to work with.
I try my best to keep the codes concise, but obvious in their meaning, so that they can be used as-is if needed. This will often be a common abbreviation, or a short version of the label. "not_needed", "processed", things like that.
That's a good point @jcarlson . I think the upshot is that, as with all data construction things, the end-user needs to be considered first and foremost. Are you the only one who manipulates this data? If so, what works best in your mind? Will you get turned around by codes that like the ones described in jcarlson's example, or does that make sense for your mind? Will anyone else be doing edits, especially in batch? How important is very consistent data entry to you? There's a lot to consider.
We adopted the practice to keep both sides the same because when the feature class was exported it only held the code numbers which were pretty much useless without the user have a comparably lookup table to join with it.
This was precisely the issue I ran into. It doesn't happen often because I try to stay away from shapefiles, but it is unavoidable with some of our departments, and it drives me nuts having to decipher the codes.