How or when should you use a related table over having the data in the geodatabase?

217
7
11-12-2019 03:25 PM
AnthonyMosinski1
New Contributor

I asked this question in another forum and received some feedback but wanted to ask it here as well to possibly get additional information. 

 

How or when should you use a related table over having the data in the geodatabase?

 

For example, in my office we are collecting data, using collector, each year on species (Turkey) counts per location. Currently, we have a point shapefile created with an attribute table similar to the one below. 

 

Site2015Count2015Density2016Count2016Density2017Count2017Density2018Count2018DensityAcres
1540.027190.010150.008100.0051984
2800.006820.006900.007920.00713815
3220.016300.022330.024360.0261381
4800.015840.015840.015810.0155456
5710.001690.001630.001600.00179651

 

Our question is, should we have the data in a related table or add 2 new columns each year? We see advantages and disadvantages of both. If we switch to a related table symbolizing by certain years is not easily done, we do this often, but we do not have to continually add a column each year. However, we do need to add a new year to the domain list each year if using the related table. If we keep the data in the geodatabase it will allow us to symbolize by each year without exporting and making copies of the layer with queries to view each year. 

 

I guess the question is, is there a standard or best practice saying that data should be in a related table when x occurs. Or is it more of a preference thing at this point since GIS can handle larger datasets now?

 

Also, if this should be posted in another forum I would be happy to post it there as well. 

 

Thank you.

0 Kudos
7 Replies
MichaelVolz
Esteemed Contributor

The ability to use this data in a web application could influence your decision if that is a future use of this data.  The data, if using a relate, would need to reside in the same geodatabase in order for the relate to work in a service that is fed to the web app.

0 Kudos
AnthonyMosinski1
New Contributor

The data is and will be fed up to a collector application to allow biologists to collect future years data. Would that cause an issue if a related table is used?

0 Kudos
DanaNolan
Regular Contributor

Having repeating fields, like your counts, is always a sign that you need another related table, if you ask a database designer. But, it depends on how you are using your data now and how you might use it in the future. I don't see the difficulty in symbolizing the data either way, but how would you compare years' data the way it is organized now? In a database, you would have to keep changing your formula to get the e.g. average of all years because of the changing number of fields. But if this data is being analyzed in a spreadsheet, this would be easily handled.  

Database standards come from big transactional databases where quality requires less redundancy and data errors may be legion. Removing redundancy whenever possible is important to reduce work and improve data quality by removing the need to keep resynching files. So if you have a good handle on your data quality and workflow in Collector, that is more important than the standards.

0 Kudos
AnthonyMosinski1
New Contributor

I agree I think database designers and biologists have different mindsets for data management. But which way is correct or best practice will forever be the debate and struggle. Do you follow database practices or workflow practices? And how do you convince database designers to allow workflow over database, or the other way? 

Currently, the data in a geodatabase is very easy to symbolize by each year quickly. Simply go into the symbology and change to year you desire. However, if you have a related table, you have to query the year in the related table, select the features, then export that selection to a shapefile and symbolize. But if the data changes that exported shapefile is incorrect and another must be made.

The only analysis on the data currently, is the density measure for each year so not much is done outside the database and in excel or R. But I do understand what you are saying if you want an overall average per site. But would it not be the same issue in a related table? You would have to change your formula each year to include the new years information. 

0 Kudos
DanaNolan
Regular Contributor

If you wanted average over years, you would simply ask for the average over each field (e.g. density). Agreed, if every year you only wanted the last three years' values, that would take a bit more work, perhaps a simple model or Python tool. If you want to symbolize from a related table you just classify by year and omit any years you don't want to see. There is also a Pivot Table capability in ArcMap now that would work with the related data but not the repeated fields.

I don't think the difference in your approach is due to any one's  biology background ( I have a little but am mostly coming from a nontransactional database collection/management background). It sounds more like you are working with already summarized and clean data. Summarized data is not what database designers work with.

0 Kudos
AnthonyMosinski1
New Contributor

Can you further explain what you mean by summarized and clean data? And what you mean by summarized data is not what database designer work with? 

If I am going to discuss this with our staff and fight one way or the other this could be a good point to share (in favor of the geodatabase over the related table). 

0 Kudos
DanaNolan
Regular Contributor

Your annual summaries are simply not raw data, which is where database design can improve quality. What you have really is barely a database; it is a data store, yes, but it is not where you collect or QC data. It is more of a report in table form.180

0 Kudos