02-11-2013 11:13 AM
Medians and Surveymeans:
I am trying to determine the median values for health worker's workload across settings, but when I run surveymeans requesting median (I use the By statement since median does not work with the domain statement), anyway the median value I get is different that the median value I get when I hand calculate the median.
For example surveymeans says the median value is 18.3 in a number set of 6,9,13,14,14,15,15,19,19,19,25,25,27,39,45,54,74,101,180. I hand calculated median of 19 (not terriblely far off, but why 18.3 ?). or surveymeans gives me a median of 41 out of a string of this values: 2,9,13,25,36,40,44,44,55,70,75,90,91. I calculate the median of 44.
The data is weighted and clustered, but the weights within a strata(settings) are all the same.
Any thoughts on why the median values differ??
02-11-2013 11:33 AM
There are many methods to estimate quantiles (such as the median) The one used for SURVEYMEANS is diocumented here
it probably differs from yours.
02-12-2013 09:59 AM
I would like to follow-up on your post with an additional thought and a question. If using a subset, the appropriate way to get summary statistics is through domain analysis. I,too, am interested in median values in subgroups and have not been able to figure out how to get these values using domain analysis (and by-group analysis is not correct--actually in my data hand calculated vs. by-group medians are very different).
Any suggestions on how to get median values using domain analysis in PROC SURVEYMEANS (or anothe PROC)?
02-12-2013 04:34 PM
you are correct - the "by" option will not produce correct estimates ... or at least they will be different estimates, of the mean and varience measures.
I am not completly sure how the weight for the participants in each site is calculated, but it is weird.
02-18-2013 02:22 PM
One way that you can obtain domain statistics including percentiles like medians from the PROC SURVEYxxx procedures is to create new survey weights from the original survey weights, and run these procedures using these new survey weights.
For observations within a specified level of a domain, the new survey weights equal the original survey weights, but for observations at another level of that domain, the new survey weights equal a small positive number close to zero (for example, 0.000000001). For each level of a domain, create similar new survey weights. Then, run the PROC SURVEYxxx procedures across all the levels of a domain using the corresponding new survey weights. If you do this often, you can "macro-ize" the creation of these new survey weights and merge them to the original data set.
The justification for this method is found in the following reference:
Graubard BI, Korn EL. Survey inference for subpopulations. American Journal of Epidemiology 1996;144:102-106.