BookmarkSubscribeRSS Feed
cotterness
Calcite | Level 5

Medians and Surveymeans:

I am trying to determine the median values for health worker's workload across settings, but when I run surveymeans requesting median (I use the By statement since median does not work with the domain statement), anyway the median value I get is different that the median value I get when I hand calculate the median.

For example surveymeans says the median value is 18.3 in a number set of 6,9,13,14,14,15,15,19,19,19,25,25,27,39,45,54,74,101,180.  I hand calculated median of 19 (not terriblely far off, but why 18.3 ?).  or surveymeans gives me a median of 41 out of a string of this values:  2,9,13,25,36,40,44,44,55,70,75,90,91.  I calculate the median of 44.  

The data is weighted and clustered, but the weights within a strata(settings) are all the same.

Any thoughts on why the median values differ??

5 REPLIES 5
PGStats
Opal | Level 21

There are many methods to estimate quantiles (such as the median) The one used for SURVEYMEANS is diocumented here

SAS/STAT(R) 9.3 User's Guide

it probably differs from yours.

PG

PG
arositch
Calcite | Level 5

I would like to follow-up on your post with an additional thought and a question.  If using a subset, the appropriate way to get summary statistics is through domain analysis.  I,too, am interested in median values in subgroups and have not been able to figure out how to get these values using domain analysis (and by-group analysis is not correct--actually in my data hand calculated vs. by-group medians are very different).

Any suggestions on how to get median values using domain analysis in PROC SURVEYMEANS (or anothe PROC)?

cotterness
Calcite | Level 5

you are correct - the "by" option will not produce correct estimates ... or at least they will be different estimates, of the mean and varience measures.

I am not completly sure how the weight for the participants in each site is calculated, but it is weird.

Good luck!


1zmm
Quartz | Level 8

One way that you can obtain domain statistics including percentiles like medians from the PROC SURVEYxxx procedures is to create new survey weights from the original survey weights, and run these procedures using these new survey weights.

For observations within a specified level of a domain, the new survey weights equal the original survey weights, but for observations at another level of that domain, the new survey weights equal a small positive number close to zero (for example, 0.000000001).  For each level of a domain, create similar new survey weights.  Then, run the PROC SURVEYxxx procedures across all the levels of a domain using the corresponding new survey weights.  If you do this often, you can "macro-ize" the creation of these new survey weights and merge them to the original data set.

The justification for this method is found in the following reference:

   Graubard BI, Korn EL.  Survey inference for subpopulations.  American Journal of Epidemiology 1996;144:102-106.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3317 views
  • 0 likes
  • 4 in conversation