Census Block Groups

StevenApell · ‎07-03-2013

I have data from 1980 2000, and 2010 from census bureau by block groups and i have geocoded 100 addresses. I am trying to extract data from a buffer zone of 0.5 miles radius around each point. There are about 100 points in different parts of the country.
1. How does one calculate the average of the median income or median housing value within o.5 miles radius of a point/address that might cover as many as four block groups.
We have explored using the average weighted join but found that mathematically its not very solid approach. someone suggested using centroid to calculate the averages but i am clueless as to which tool would work best

RichardFairhurst · ‎07-04-2013

Ok thank you ... so what tools and process in ArcGIS should i use to combine the data and average it?? none of , my 0.5 mile buffer will be entirely in one block group. They will cover a portion of neighboring blocks, so i have to find a a way to join the abutting blocks groups then find an average or for the buffer zone

You could use Intersect of your Buffers against the block groups. This should retain the total area of the original census block in the area field, which would be important. If your buffers overlap each other then you will need to Dissolve the Intersected Output using every field as a unique case value except the ObjectID to make sure no Block group is divided into two or more pieces. If you want a weighted average of your block groups you would need to create some fields to multiply the statistics you are after times the portion of area within each block group intersected if it represents a census block average. If the original statistic of the block group represented a census block total or sum, then that values should be mulitplied by the area of the portion that was intersected from the Census Block and then divided by the total Census block area captured by the Intersect. Then Dissolve a second time using just the Buffer IDs as the unique case and summing all of the other statistics (both the original statistics from the blocks for a standard mean and the weighted statistics for a weighted mean). Include a summary statistic that generates a count also just in case, although Dissolve may generate its own count value.

Now if the statistics originally represented an average, maximum or minimum, divide the summaries by just the Dissolve Count and if the statistics were originally an average, minimum or maximum that was weighted by area, divide the summaries by the total area of the newly dissolved buffer. If the summary values were originally a total or sum of the Census Block just divide by the Dissolve Count if they were not weighted and do no division if they were weighted by area. Bottom line is that original census block averages, minimums and maximums have to be handled differently from original census block sums or totals (whether I have worked through the math correctly in the steps I have described or not).

Getting a median or weighted median is trickier. And frankly I would prefer you examine the results of the mean and weighted mean first before going over those methods. The median or middle value will probably be less reliable if the areas covered by the buffer are not all nearly equal or if one area dominates over all of the others (in which case the max area statistics are better). Either way Median will be simply choosing one or the average of two of the census blocks for statistics and ignoring all the other census blocks.

All of the above operate in the absence of any other information about the census block. If you had aerials, parcel valuation data, or land use data, adjustments could be made to account for absences of population or jobs in open space areas, higher or lower valuations of properties and structures, vacant lands, commercial/industrial uses and residential density distributions, etc. If those sources are available they should be sampled to determine how well you analysis correlates with expected results from these sources.

View solution in original post

RichardFairhurst · ‎07-03-2013

I have data from 1980 2000, and 2010 from census bureau by block groups and i have geocoded 100 addresses. I am trying to extract data from a buffer zone of 0.5 miles radius around each point. There are about 100 points in different parts of the country.
1. How does one calculate the average of the median income or median housing value within o.5 miles radius of a point/address that might cover as many as four block groups.
We have explored using the average weighted join but found that mathematically its not very solid approach. someone suggested using centroid to calculate the averages but i am clueless as to which tool would work best

Aggregating data is easy and reliable, but to be honest there is no real way to dis-aggregate data that is reliable without examining details that give you more information than what the aggregations provide such as aerials and making judgments on which values you think are most representative of the specific location you are plotting based on more specific sampling. All averages are a shot in the dark when you get to a single case location. If you use an average of some kind you could use straight mean, the median (middle value or average of two middle values), weighted mean, weighted median, or simply ignore the radius and use the single census block your location falls within (assuming none falls precisely on a boundary). But none will necessarily be the most reasonable when you look at the location in detail. So make a choice that you are willing to invest the time in, and then stick with the methodology for consistency. Any case by case examinations will always take the most time and be the least reproducible without storing additional data points for others to follow.

StevenApell · ‎07-04-2013

Aggregating data is easy and reliable, but to be honest there is no real way to dis-aggregate data that is reliable without examining details that give you more information than what the aggregations provide such as aerials and making judgments on which values you think are most representative of the specific location you are plotting based on more specific sampling. All averages are a shot in the dark when you get to a single case location. If you use an average of some kind you could use straight mean, the median (middle value or average of two middle values), weighted mean, weighted median, or simply ignore the radius and use the single census block your location falls within (assuming none falls precisely on a boundary). But none will necessarily be the most reasonable when you look at the location in detail. So make a choice that you are willing to invest the time in, and then stick with the methodology for consistency. Any case by case examinations will always take the most time and be the least reproducible without storing additional data points for others to follow.

Ok thank you ... so what tools and process in ArcGIS should i use to combine the data and average it?? none of , my 0.5 mile buffer will be entirely in one block group. They will cover a portion of neighboring blocks, so i have to find a a way to join the abutting blocks groups then find an average or for the buffer zone

RichardFairhurst · ‎07-04-2013

Ok thank you ... so what tools and process in ArcGIS should i use to combine the data and average it?? none of , my 0.5 mile buffer will be entirely in one block group. They will cover a portion of neighboring blocks, so i have to find a a way to join the abutting blocks groups then find an average or for the buffer zone

You could use Intersect of your Buffers against the block groups. This should retain the total area of the original census block in the area field, which would be important. If your buffers overlap each other then you will need to Dissolve the Intersected Output using every field as a unique case value except the ObjectID to make sure no Block group is divided into two or more pieces. If you want a weighted average of your block groups you would need to create some fields to multiply the statistics you are after times the portion of area within each block group intersected if it represents a census block average. If the original statistic of the block group represented a census block total or sum, then that values should be mulitplied by the area of the portion that was intersected from the Census Block and then divided by the total Census block area captured by the Intersect. Then Dissolve a second time using just the Buffer IDs as the unique case and summing all of the other statistics (both the original statistics from the blocks for a standard mean and the weighted statistics for a weighted mean). Include a summary statistic that generates a count also just in case, although Dissolve may generate its own count value.

Now if the statistics originally represented an average, maximum or minimum, divide the summaries by just the Dissolve Count and if the statistics were originally an average, minimum or maximum that was weighted by area, divide the summaries by the total area of the newly dissolved buffer. If the summary values were originally a total or sum of the Census Block just divide by the Dissolve Count if they were not weighted and do no division if they were weighted by area. Bottom line is that original census block averages, minimums and maximums have to be handled differently from original census block sums or totals (whether I have worked through the math correctly in the steps I have described or not).

Getting a median or weighted median is trickier. And frankly I would prefer you examine the results of the mean and weighted mean first before going over those methods. The median or middle value will probably be less reliable if the areas covered by the buffer are not all nearly equal or if one area dominates over all of the others (in which case the max area statistics are better). Either way Median will be simply choosing one or the average of two of the census blocks for statistics and ignoring all the other census blocks.

All of the above operate in the absence of any other information about the census block. If you had aerials, parcel valuation data, or land use data, adjustments could be made to account for absences of population or jobs in open space areas, higher or lower valuations of properties and structures, vacant lands, commercial/industrial uses and residential density distributions, etc. If those sources are available they should be sampled to determine how well you analysis correlates with expected results from these sources.

StevenApell · ‎07-05-2013

You could use Intersect of your Buffers against the block groups. This should retain the total area of the original census block in the area field, which would be important. If your buffers overlap each other then you will need to Dissolve the Intersected Output using every field as a unique case value except the ObjectID to make sure no Block group is divided into two or more pieces. If you want a weighted average of your block groups you would need to create some fields to multiply the statistics you are after times the portion of area within each block group intersected if it represents a census block average. If the original statistic of the block group represented a census block total or sum, then that values should be mulitplied by the area of the portion that was intersected from the Census Block and then divided by the total Census block area captured by the Intersect. Then Dissolve a second time using just the Buffer IDs as the unique case and summing all of the other statistics (both the original statistics from the blocks for a standard mean and the weighted statistics for a weighted mean). Include a summary statistic that generates a count also just in case, although Dissolve may generate its own count value.

Now if the statistics originally represented an average, maximum or minimum, divide the summaries by just the Dissolve Count and if the statistics were originally an average, minimum or maximum that was weighted by area, divide the summaries by the total area of the newly dissolved buffer. If the summary values were originally a total or sum of the Census Block just divide by the Dissolve Count if they were not weighted and do no division if they were weighted by area. Bottom line is that original census block averages, minimums and maximums have to be handled differently from original census block sums or totals (whether I have worked through the math correctly in the steps I have described or not).

Getting a median or weighted median is trickier. And frankly I would prefer you examine the results of the mean and weighted mean first before going over those methods. The median or middle value will probably be less reliable if the areas covered by the buffer are not all nearly equal or if one area dominates over all of the others (in which case the max area statistics are better). Either way Median will be simply choosing one or the average of two of the census blocks for statistics and ignoring all the other census blocks.

All of the above operate in the absence of any other information about the census block. If you had aerials, parcel valuation data, or land use data, adjustments could be made to account for absences of population or jobs in open space areas, higher or lower valuations of properties and structures, vacant lands, commercial/industrial uses and residential density distributions, etc. If those sources are available they should be sampled to determine how well you analysis correlates with expected results from these sources.

http://resources.arcgis.com/en/help/main/10.1/index.html#//00310000002m000000