10-17-2013 11:14 AM
Hi I see how to report data in a proc tab as a proportion of a column total, and I see how to report the mean and its confidence limits. I'd like to run a combination of these, reporting the column percentages and the confidence limits of those percentages - can anyone help me do that?
I'm using survey data, and have weights to approximate the population. A simple example of the data I'm looking at might be:
Fruit Volume Survey weight
Orange 10 6,242
Apple 7 3,995
Orange 3 6,890
Apple 2 9,039
Banana 9 6,979
Apple 9 3,713
Banana 8 2,686
Orange 10 9,119
Orange 1 2,358
Orange 7 5,612
Generating the summary table:
Pop sum Col pct 95%_lo_colpc 95%_hi_colpc
Banana 84,299 22% ?? ??
Apple 79,460 21% ?? ??
Orange 215,922 57% ?? ??
Can anyone help me with the col pct confidence interval?
Thanks in advance
10-17-2013 12:39 PM
Things to worry about:
Confidence intervals on percentages have to be bounded, below or above, near the extremes, so exact methods are needed.
It looks like your categories are exhaustive of your data, so at least one of the categories is an exact linear combination of the others (100 minus the sum of all others), which leads to independence considerations.
Given all of that, I would look at Example 38.4 Binomial Proportions in the FREQ Procedure documentation os SAS/STAT. For your data, something like:
proc freq data=yourdata;
tables fruit / binomial (ac wilson exact) alpha=0.05;
I don't understand the volume variable.
If the situation is more complex, and you know the survey weights, target population size, clusters and sampling frames, then PROC SURVEYFREQ is what really ought to consider.
10-18-2013 09:20 AM
Thanks for that - the 'volume' in the example gives the volume of fruit eaten, so the first individual eats 10 oranges, and represents 6,242 individuals in the overall population (accounting for 62,420 of the 215,922 oranges eaten in our overall population).
I'd thought to create a new variable fruitweight=volume*surveyweight, but as I understand it SAS would then infer the variance from this new variable, not from its constituent parts, so I'd get an approximation - however, if it's close enough...
Any further thoughts?