Sampling is used widely in business, manufacturing, science, social research and elsewhere. Why? In order to represent a much larger set, such as the population of a state, the totality of items manufactured, the quality of the corn grown in large area, etc. It may be too expensive or even impossible to check every one of millions of items or persons; so we examine a few. How few? If we ask only one person how many cups of coffee they drink in one day, we learn next to nothing about the rate of coffee consumption in NJ. But if we ask five, fifty, or 500 randomly selected folks the same question, we begin to get an estimate of the answer with an increasing sense of accuracy.
So, in order to see the effect of increasing sample size n on the accuracy of the estimate, we use Excel to simulate random observations, =5+10*RAND(), and plot the frequency of the average (for n=1, 2, ...5) on a comparative graph, like this.
Generate
1000
sets of five-tuples in columns
A-E (the example is for 500 sets of observations). Column G (AVERAGE($A1:A1)) is identical
to A (the average of one number equals the number itself), col. H=AVERAGE($A1:B1),
..., K=AVERAGE($A1:E1). These are averages of random samples of a single
observation, 2 observations, etc. Show the respective frequency
distributions (N-R) with bins in M. Add a combined
graph of the frequencies. (the column graph is for n=1; show it similarly or via
a line graph.
Using "Landscape" orientation, print three (or four) single pages to display
(a)
the graph,
(b) columns A-R (as in the example), and
(c) the formulas in columns (A-R)
[Press CTRL + ` (grave accent)];
use appropriate zoom, e.g. 50% to help fit these columns; if necessary,
widen columns so the equations show entirely.
You may need 2 pages for Part (c)-- no need to show all 1,000 samples, however
(see example)
(d) Explain, briefly, what the comparative graph tells us about sampling and the
relative precision of the procedure.
![]() |
|
| A manufacturing process produces computer
chips using a totally new method. As is often the case in such production
situations, not all of the product is usable. A proportion p
of it turns out to be defective. Samples of 20 chips are tested every hour
to ascertain that the process is stable. We simulate 600 samples of 20 chips
each with p=.4 (value in cell A1). The formula to
provide 0's and 1's in the correct proportion is =
IF(RAND()<$A$1,1,0). The above figure shows a representative portion of the
spread-sheet which is similar to EXCELcise 5, above.
The sampling distribution (column Z and this graph)
shows the variation in sample proportion (p-hat) due to chance
alone. |
|