Analyze This!

Anonymous User · ‎12-31-2009

Hello, this is Sigmund Frodo…er…Jim W., Geodata Analyst at ESRI Support Services in Charlotte, NC with some thoughts that might help to ‘enlighten’ you on the inner life of your relational database management system (RDBMS). Just as some humans will visit a psychiatrist to help them understand the inner workings of their minds, databases have been known to benefit from the help of a qualified professional practitioner who can give meaning to the vast number of disassociated bits and bytes that swirl around deep within their digital brains. Regular ‘analysis’ of a database can be a good thing and can help to maintain an optimum level of equilibrium necessary for quickly answering the ‘BIG’ queries of life, such as, “Where can I find a good burger?”

Now, geodatabases have been known to have a ‘spatial’ complex, and periodically analyzing them will keep them grounded in reality, thereby improving overall performance. But first, we digress with a brief interlude into the realm of statistics…

Analyzing a database involves collecting statistics that help us to get a handle on the nature of the data that is contained within the database. These statistical facts about objects such as tables, columns, and indexes are then stored internally within the RDBMS’s data dictionary tables. They help the database optimizer to determine the optimum path to data, ensuring the fastest response time for queries, while at the same time minimizing the cost of database resources. For example, getting a count of how many rows of data are contained in each table may help the database optimizer determine whether or not to use an index, or how best to join two tables when selecting the best execution plan for a given SQL statement.

What kinds of statistics are gathered? Analyzing a table can retrieve and store metadata such as the number of rows in each table, the average row length in bytes, the average column length, the minimum and maximum values contained in each column, and the number of null values in a column. Other statistics can describe the data by looking at the number of distinct values contained in a column (known as its cardinality), as well as by constructing histograms that give an idea of how the data is distributed (whether the data is evenly distributed throughout its range, or is it clumped together with a large number of rows containing similar values). Clinical terms such as platykurtosis, leptokurtosis, and skewness come fondly to mind, but that’s another story for another time…

So, you may ask, "Just how are statistics collected?" It can be as simple as pointing and clicking on an individual feature class in ArcCatalog and then analyzing. You could also use an ArcSDE command line tool called sdetable, or the Analyze geoprocessing tool available in ArcToolbox. Or, you might even want to set up an automated scheduled task using the tools to gather statistics provided by the specific RDBMS that you’re using.

For a more in-depth look at analyzing a geodatabase and collecting statistics, most of what you’ll need to get started is located at: About updating geodatabase statistics.

Just remember, since data can be dynamic and ever changing, it’s a good idea to frequently analyze in order to pick up any changes that may occur in the database. In a healthy geodatabase, where numerous edits may occur on a daily basis, it could be wise to schedule frequent analysis sessions. Regular analysis can help improve the display time of versioned feature classes, as well as speed up other edit processes where fast query response times are required to keep you from waiting. And best of all, there’s no charge!

- Jim W., Support Analyst - Geodata Unit, ESRI Support Services, Charlotte, NC.