Solved
Contributor
Posts: 31

# Most efficient way to get a bunch of means

Suppose I've a got a dataset with a some "V" variables (V1 - V3) that each can take on values A, B, or C (which I call "Options").  I've also got avariable called "Score" that is numeric. Here is an example of the data:

Data new;

Input V1\$ V2\$ V3\$ Score;

Cards;

A B A 24

C C B 66

B A C 25

C A B 40

;

Run;

I want to compute the average "Score" for all those who have a value of A on V1, a value of B on V1, a value of C on V1, a value of A on V2, a value of B on V2, a value of C on V2, a value of A on V3 a value of B on V3, and a value of C on V3, and I want to save all this information in a single dataset called "Final".

The following code works and gets me the output dataset in the format I want:

%Macro OptionStats;
%Do i = 1 %To 3;
Proc Sort Data=new; By V&i; Run;
Proc Means Data=New Mean Noprint; Var Score;  By V&i;
Output Out=Score_V&i (Drop=_TYPE_ _FREQ_)  Mean()=;
Run;
Data Score_V&i; Set Score_V&i; Rename V&i=Option Score=V&i; Run;
%End;
%Mend;
%OptionStats;

Data Final;  Merge Score_V1 Score_V2 Score_V3;  Run;

My question is whether there is a better, more efficient way to do this.  In the end, I will have a large dataset with hundreds of "V" variables (instead of just 3) and thousands of rows (instead of just 4), and it seems like it will be slow to have it go through an iterative process with a Proc Sort and a Proc Means for each V variable.

Any ideas?

Accepted Solutions
Solution
‎05-31-2013 02:25 PM
Posts: 3,852

## Re: Most efficient way to get a bunch of means

Maybe you should test it.  I think the OP wants 1 ways for many CLASS variables that should not "exceed" the limits maybe.  Otherwise just transpose and do the means by _NAME_ and level.

All Replies
Posts: 3,852

## Re: Most efficient way to get a bunch of means

Maybe this will work.

Data new;
Input V1\$ V2\$ V3\$ Score;
Cards;
A B A 24
C C B 66
B A C 25
C A B 40
;

Run;
proc summary data=new missing chartype;

class v:;
ways 1;

output out=ways1 mean(score)=;
run;
Super User
Posts: 13,583

## Re: Most efficient way to get a bunch of means

With "hundreds" of V variables the combinations will likely exceed the combinations that Proc Summary or means will handle. You may want to examine the CLASSDATA= procedure option to identify the the combinations of V variables that you want output. It may still take a couple of passes through the data depending on the numbers of combinations you request but you would only need to change the Classdata data set and output set.

Solution
‎05-31-2013 02:25 PM
Posts: 3,852

## Re: Most efficient way to get a bunch of means

Maybe you should test it.  I think the OP wants 1 ways for many CLASS variables that should not "exceed" the limits maybe.  Otherwise just transpose and do the means by _NAME_ and level.

Contributor
Posts: 31

## Re: Most efficient way to get a bunch of means

Thanks to you both for your helpful suggestions.  As it turns out, the data comes out of the query in a different format (untransposed), then it is transposed into the format described in my question above for some other additional analyses.  So it made by far the most sense to simply do the Proc means by _NAME_ and level as a separate step before the data was transposed the first time.