Best Practice for Text Field Length in Feature Classes

4830
3
03-26-2020 02:12 PM
AaronKoelker
Occasional Contributor III

I have a request for adding a text field to a dataset with a large field length. (5000, maybe 6000 characters o_o). This would be on a feature class stored in an enterprise file geodatabase, which is hitting an editable web service on ArcGIS Server for use in the Collector app. The idea is that this field would contain keywords, or each unique instance of a word found in an attached text document for that feature, hopefully creating a simple way to search document content via ArcGIS Desktop or an ArcGIS Online web app, without needing to store the entirety of the document text content in the attribute table somewhere but still having it available via the related attachment, if needed.

My question is: is this a terrible idea, and if so, why? My gut says that nothing good will come out of a text field with that many characters, seeing as the default is only 255, but I've never actually tried it or heard of anyone doing something like that. I know it will be unreadable in the attribute table, but it's only for querying. There's just that nagging voice in the back of my head that says to keep the field length as short as possible, no matter what. I have no idea what it will do to the file size and performance of the dataset. Not every record will have 5000 characters contained in that field, (it's more likely to be 1-2000 on average), but some of them will. I know the limits of a text field in a fgdb feature class are techincally 2 million+ characters in length or something equally absurd, but I know just because we can doesn't mean we should, lol.

So while my gut says no, I don't really have an alternative solution. Just curious if anyone has already gone down this path before and can save me some grief ahead.

-Aaron
3 Replies
BenTurrell
Occasional Contributor III

Hey Aaron Koelker‌,

This is a very interesting question and I have been unable to find much documentation on this sort of thing. Some things to consider:

* The more text there is to search the longer your search will take

* Some databases have a limit on how long a field can be if they are to be indexed

* The size of the database will increase a lot!

I would suggest stepping back from the problem and looking at it from a higher level. It sounds like you need to be able to search for a document and there are plenty of tools outside of Arcgis that allow you to search content. I  personally would be using ArcGIS to search by location, date, person who uploaded it. You could add a new field called tags and let people specify certain key words similar to how data can be searched in ArcGIS Enterprise.

Just some ideas, happy to discuss further!

Thanks,

Ben


If this answer was helpful please mark it as helpful. If this answer solved your question please mark it as the answer to help others who have the same question.

AaronKoelker
Occasional Contributor III

Hey Ben,

Thanks for the insight. The increase to the database size is what I'm most afraid of, I just have no idea how large of an increase it will be, or at what point I will start running into problems. Longer search times are never ideal, but acceptable if the alternative is no easy search function at all.

To clarify, it's not so much that they need to find the documents; it's more that they want to use the documents to find the related spatial points, if that makes sense.

For better context, the dataset is an inventory of all the different signs and exhibits found in state parks. Everything from trailheads and information kiosks, to descriptive text for museum exhibits and art pieces are included. They are capturing a lot of different type information and categories in other fields within the table, but the collection is so broad in terms of subject matter that types and categories can only account for so much. For example, they want to be able to say "Ok, do we have anything out there that mentions John D Rockefeller" (as a random, specific example), then do a search and find all the signs where the related doc might mention John D Rockefeller. Those results might have a sign type like "Museum Exhibit" and a topic of "Cultural - Historic Figure" or something, and could have a title like "American Entrepreneurs", but nothing as specific as a "John D Rockefeller" category. Yet there could be a dozen signs out there that mention him across the state park system.

-Aaron

-Aaron
0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I agree with most of what Ben states but not all.  For larger text fields in databases, it is common to use VARCHAR/NVARCHAR, and the file geodatabase does something similar.  So, an empty text field with a maximum length of 2000 doesn't take up any more space than an empty text field with a maximum length of 500.  The issue becomes the content itself.  If most of the rows will have 1000-2000 characters, that will take up some space.

I think one of the big risks that hasn't been mentioned is the impact of large VARCHAR/NVARCHAR fields on interoperability.  Certain data formats don't support a variable-length text field, e.g., shape files, so exporting data to these formats can really explode the size of them.  Also, some data formats do goofy things with table formatting even if they support variable-length text fields, e.g., a column width may be set to maximum field length making browsing the table a hassle.

I think all enterprise-grade DBMSs offer some form of full-text search for stored documents like PDFs, etc....  The implementations seem to vary between DBMS, so I am not sure how they play into the enterprise geodatabase model.

I am not saying don't do it, just think through it, which you already seem to be doing.