GIS GIGO (Garbage In Garbage Out): 30 checks for data errors
Nathan Heazlewood of Eagle Technologies wrote a very useful essay about “garbage in, garbage out” in relation to geospatial data. In it, he not only ties this oft-heard phrase to the importance of GIS data quality, but he also details the checks that GIS analysts should go through when they are assessing a data set. I would argue that this checklist is also useful for educators and for students as they document their own work for two reasons: (1) Paying attention to data quality is even more important now than ever (as I described recently in this blog), and (2) nowadays, with the advent of Web GIS, everyone working in GIS is a potential data producer.
The list of 30 items is grouped under checks for positional accuracy, topological logic, geometric considerations, projections and coordinate systems, attribute and data structure checks, and attribute and data structure checks. Extremely helpful are Nathan’s diagrams showing tables lacking null values for non-null attribute data, values outside permitted ranges, and orphan records in related tables.
Nathan includes many considerations that are not often discussed but can lead to enormous problems, such as the different standards and formats of dates being used around the world, from year-month-day to day-month-year to month-day-year (which Nathan dubs the “super dumb American date format”). Another consideration is one I can identify with that was a significant challenge for me during a GIS workshop I taught in Turkey–the numbers in my data set were formatted such as 100,000 for one hundred thousand, but the software in the university lab, given its location, was naturally configured for one hundred thousand to be coded as 100.000.
How might you be able to use this data error checklist in your own instruction? What checks would you consider adding to this list when you are teaching GIS?