Consider the hypothetical setup:
X denotes counties in a city. We can have X1, X2,..., Xi.
Y denotes hospitals in the city. So let's say for county X1, we have Y11, Y12,..., Y1j, and so on for other counties.
Z denotes number of nurses in these hospitals. So for county X1, we have Z11 for hospital Y11, Z12 for hospital Y12,... Z1j for hospital Y1j, and so on.
We take a sample of hospitals, so not all of them are included in the sample. Let's say:
x denotes counties in the sample, x1, x2,..., x(i)
y denotes hospitals in the sample. For x1 we may have y1, y2,..., y(u), and so on.
z denotes number of nurses in the sample's hospitals. For x1 we have z1, z2,..., z(u).
I want to use the data from the sample to estimate the number of nurses in the target population (the city). Using mean value in a quick and dirty fashion for this purpose, I think of two potential approaches:
In the first approach, simply assuming the overall mean of nurse in the sample represents the mean of population. That leads to the following estimation of Z:
Z = (mean of z)*Y
In the second approach, I will calculate in a similar way but for each county and summing up those individual estimates. So for county X1,
Z1 = (z1 + z2 + ... + z(u))/(number of sampled hospitals of county x1)*number of hospitals of county X1
After that: Z = Z1 + Z2 + ... + Zi
It turns out that the two methods produce different results most of the time. I wonder which is the better estimates.
... View more