04-06-2022 06:38 PM
New Contributor II

I need help, please, in determining a sample size for a random selection of polygons. I need to select a set of mosques for a mortality survey, but I don't have a ready made list from which to select them.  I've subdivided my study area into 10,000 4 square kilometer polygons.   My team will thoroughly identify all of the mosques within those sample areas through a foot survey. That should give us a list of representative mosques from which to randomly select the mosque sample. I'm stuck, however, in figuring out how many of those 10,000 polygons to walk. I think that I should be applying Cochran's formula to this problem. It looks like this:

Z is the z score that sets the "confidence level". The gold standard for the social sciences is 1.96 for a 95% level. You set that if you want your sample to get you the accuracy you expect 19 out of 20 times.

p is the "population proportion", the estimated proportion of the attribute present in the population. In a marketing survey it would be something like "what proportion of consumers who prefer brand x."

Question 1: What should that attribute be in this context? I can't identify any. Unless someone can enlighten me, I'm going to punt with a 50% proportion, "maximum variability", the worst you can have if you want to keep your sample size down.

M is the ‘confidence interval’ or 'margin of error', within which you want your estimate to fall. You express it as an absolute, in decimal format. "I want to know what percentage of consumers prefer brand x, give or take five percentage points (absolute, not relative).

Question 2:   What should I use for my M value?  What margins? Around what? I'm not measuring any attribute to do with the sample blocks, just listing their mosques. My only wish for them is that, as a group, they contain a large, representative bunch of mosques. 

Am I just trying to fit a round peg in a square hole? Should I be taking a different approach to the question of how many blocks to select to be surveyed?   The only alternative I can think of is to approach the problem using trial and error:  I'd mock up a list of mosques with a set of assumed mortality rates, sprinkle the imaginary mosques around my 10,000 polygons, select a sample of n of those polygons, select a random sample of the mosques within the n sample areas, and calculate the mortality rate of the sampled mosques.  I'd do that repeatedly for a given sample size, and see if my estimated mortality rate consistently came out close to my original assumed overall mortality rate.  

Thanks for any guidance you can offer.

