Confidence interval for column percentages with weighted data

Hi I see how to report data in a proc tab as a proportion of a column total, and I see how to report the mean and its confidence limits. I'd like to run a combination of these, reporting the column percentages and the confidence limits of those percentages - can anyone help me do that?

I'm using survey data, and have weights to approximate the population. A simple example of the data I'm looking at might be:

Fruit     Volume     Survey weight

Orange     10     6,242

Apple     7     3,995

Orange     3     6,890

Apple     2     9,039

Banana     9     6,979

Apple     9     3,713

Banana     8     2,686

Orange     10     9,119

Orange     1     2,358

Orange     7     5,612

Generating the summary table:

     Pop sum     Col pct     95%_lo_colpc     95%_hi_colpc

Banana     84,299     22%     ??     ??

Apple     79,460     21%     ??     ??

Orange     215,922     57%     ??     ??

Can anyone help me with the col pct confidence interval?

Thanks in advance


Re: Confidence interval for column percentages with weighted data

Things to worry about:

Confidence intervals on percentages have to be bounded, below or above, near the extremes, so exact methods are needed.

It looks like your categories are exhaustive of your data, so at least one of the categories is an exact linear combination of the others (100 minus the sum of all others), which leads to independence considerations.

Given all of that, I would look at Example 38.4 Binomial Proportions in the FREQ Procedure documentation os SAS/STAT.  For your data, something like:

proc freq data=yourdata;

     tables fruit / binomial (ac wilson exact) alpha=0.05;

     weight surveyweight;


I don't understand the volume variable.

If the situation is more complex, and you know the survey weights, target population size, clusters and sampling frames, then PROC SURVEYFREQ is what really ought to consider.

Steve Denham

Re: Confidence interval for column percentages with weighted data

Thanks for that - the 'volume' in the example gives the volume of fruit eaten, so the first individual eats 10 oranges, and represents 6,242 individuals in the overall population (accounting for 62,420 of the 215,922 oranges eaten in our overall population).

I'd thought to create a new variable fruitweight=volume*surveyweight, but as I understand it SAS would then infer the variance from this new variable, not from its constituent parts, so I'd get an approximation - however, if it's close enough...

Any further thoughts?

