ArcGIS Pro Projects Perfected: Finding Data

765
0
05-12-2023 11:53 AM
KevinPriestley
Esri Contributor
3 0 765

Image showing descending computer code in multiple colors.Image showing descending computer code in multiple colors.Sometimes, answers to data quality questions are readily available.  We can discuss trustworthy data providers, offer ideas for data processing or cleanup workflows, describe metadata, or reference industry-standard datasets.  However, when we think about this question a bit deeper, we often find it difficult to answer.

The question, where do I go to find good data is hard to answer because it depends on an entirely different set of questions related to our project: what do we want to do with the data, how do we want to present it, to whom will we present it, when do we need to finish it, etc.  In short, the answer to these questions depend on project considerations: we need to map out a vision, scope, budget, audience, and time for our project before we dive into the data.

Let us assume we have answers to project-level questions and turn our attention to principles that guide our search for data, especially how to use metadata and choose data that meet our needs.

Data as material

ArcGIS Pro is a toolbox that gives us an amazing array of tools to perform operations on and with data.  In other words, if our GIS is the tool, then data is the material we work with.  The geoprocessing, editing, and cartographic design tools we have in our GIS let us create, modify, modulate, and represent data in unique ways that allow us to tell a story about spatial phenomena.  The goal is to use these data to inform decisions we make for ourselves, our organizations, and the broader world.Clip art image depicting a hammer on top of a wrench.Clip art image depicting a hammer on top of a wrench.

At the same time, not all data is created equal.  Sometimes, the dataset we need does not exist.  At other times it might be the wrong format or out of date.  Maybe it contains the wrong attributes or covers a different location than we need.  We might find a dataset from an excellent data provider, and later discover that it contains use limitations that prevent us from using it to answer our spatial question.

In this sense, data can be thought of as an analog to the materials a carpenter works with: as the wood and fasteners and glue that together comprise an object such as a table.

One mistake that we make as GIS analysts is to gather any material that could be relevant to a given project without identifying why we need it, how it will be used, and what purpose it serves.  It would be as if a carpenter runs to the hardware store to buy wood and glue and nails and screws of all different sizes, types, and shapes without understanding what their client wants.

Project Considerations

Image showing an audience seated at a conference.Image showing an audience seated at a conference.First and foremost, when we search for data we have to understand the vision, scope, budget, audience, and timelines involved for our project.  What will we deliver?  Who will we deliver it to?  Do we have financial constraints that limit the data we can buy?  Do we have short deadlines that limit the amount of processing or editing we can perform?  How will we maintain our deliverables after the project ends, or if we should maintain our deliverables at all? By finding answers to these kinds of questions, we set ourselves up to streamline data collection and clean-up processes.

Now, because there are so many data providers across sectors and industries, this post will not dive into specific data providers.  I assume that you know where to find the data you need to work with. 

Instead, we will use the following scenario: we have identified two datasets we could use in our project.  They come from providers we trust, but we still need to know which one to use.

Check the Metadata

No matter what the data is, we should always look at the metadata record first.

A good metadata record will tell us a ton of information: when was the dataset created; when was it last updated; what’s the spatial reference; how accurate is the dataset and at what level of detail; does the data have legally binding use limitations; and what attributes does it have.

Using metadata records helps to narrow down our search for data and understand how the data will contribute to our project.

Let’s say the metadata on both of our datasets has good attribution and we think it would be possible to use either one.  What do we do next?

Use the Tools

ArcGIS Online and ArcGIS Pro have a ton of amazing tools that can give us a more granular understanding of our data itself.

One of my personal favorites is Data Engineering in ArcGIS Pro.  Data Engineering gives us the power to generate summary statistics for attribute fields, create charts to visualize the distribution of our data, update map symbology based on fields we select.  Data Engineering lets us perform a series of geoprocessing operations to prepare data, such as eliminating unnecessary fields or creating and calculating new fields. 

With Data Engineering and geoprocessing tools, we can find whether our datasets have different feature counts, if they have missing or null values in their attribute tables, if their summary statistics greatly differ: for example, if two road network datasets purport to describe the same geographic area, the sum of feature lengths should be the same.

No matter what the tools tell us, it is up to us to make the decision.  If our datasets remain comparable after using tools like Data Engineering, how do we choose?

Diagram the Decision

In searching for data, we often make tradeoffs.  Sometimes we need to trade accuracy for availability.  Sometimes we need to trade age for attributes.  Maybe we can find the exact dataset we need, but it is tabular rather than spatial. 

Data considerations that guide our search for data include the type of data we need, age, accuracy, attributes, source, and level of detail.  We also need to know availability (e.g., use limitations) and whether it covers our area of interest.

That is a lot of information to juggle, especially when comparing more than two datasets! 

Personally, I am a visual learner.  It is probably why I am drawn to cartography: maps allow me to understand places without using words.  Writing my options down to make a choice—as in a pros and cons list—do not always work to help me arrive at good decisions quickly.  Instead, I find that making simple, color-coded visuals helps me a lot.

Here is a grid I used to compare datasets for a recent project:

A simple table that depicts data considerations, project needs, and datasets.  Under project needs, the creator added what they need for the project.  The dataset comparisons are color coded to make selecting a dataset for a project easier.A simple table that depicts data considerations, project needs, and datasets. Under project needs, the creator added what they need for the project. The dataset comparisons are color coded to make selecting a dataset for a project easier.

The Data Considerations column helps me identify important characteristics of my data that I can find in a metadata record.  My Project Needs column helps document the conditions I need each dataset to meet. And the remaining columns help me determine whether the datasets I am considering will work for my project.  A green fill color indicates the dataset meets my needs, red shows it does not, and yellow shows that it does partially.  Dataset 3, for example, was recent, freely available, and had the right attributes. But it did not have had the level of detail or accuracy of the 3 other datasets.

Parting Words

Earlier, I wrote that not all datasets are created equal.  It’s also true that not every dataset is perfect.  Understanding imperfections in our data is critical for GIS analysts. 

None of the datasets in the grid above allowed me to perform the work I needed to do, exactly as I envisioned it.  To complete my project, I decided to use imperfect data.

My choice to use one imperfect dataset instead of another affected the results of my work, and the way people perceive the spatial phenomenon that my work describes.

A lot of people look at GIS, spatial data, and data science as prescriptive remedies to all kinds of questions about modern society.  And while GIS can—and should—absolutely inform decisions about a whole host of real issues and problems, the products we create are only as good as their constituent elements: the data and the tools used to process it. 

A carpenter could use screws or glue or nails to affix a tabletop to its legs, but the results of that choice will have a significant impact on the stability of the table.

A GIS analyst can perform edits or analyze a dataset in all kinds of ways, with good or bad data and no matter what, we will get results.  GIS is simply a tool that acts on and works with material: data. 

So it goes with GIS: garbage in, garbage out.

For further training on finding and using GIS data in a project, check out Esri's Preparing Data for GIS Applications course: https://bit.ly/3NQHMQB.
About the Author
Instructor, Esri Background in Urban and Regional Planning