Suppose I've a got a dataset with a some "V" variables (V1 - V3) that each can take on values A, B, or C (which I call "Options"). I've also got avariable called "Score" that is numeric. Here is an example of the data:
Data new;
Input V1$ V2$ V3$ Score;
Cards;
A B A 24
C C B 66
B A C 25
C A B 40
;
Run;
I want to compute the average "Score" for all those who have a value of A on V1, a value of B on V1, a value of C on V1, a value of A on V2, a value of B on V2, a value of C on V2, a value of A on V3 a value of B on V3, and a value of C on V3, and I want to save all this information in a single dataset called "Final".
The following code works and gets me the output dataset in the format I want:
%Macro OptionStats;
%Do i = 1 %To 3;
Proc Sort Data=new; By V&i; Run;
Proc Means Data=New Mean Noprint; Var Score; By V&i;
Output Out=Score_V&i (Drop=_TYPE_ _FREQ_) Mean()=;
Run;
Data Score_V&i; Set Score_V&i; Rename V&i=Option Score=V&i; Run;
%End;
%Mend;
%OptionStats;
Data Final; Merge Score_V1 Score_V2 Score_V3; Run;
My question is whether there is a better, more efficient way to do this. In the end, I will have a large dataset with hundreds of "V" variables (instead of just 3) and thousands of rows (instead of just 4), and it seems like it will be slow to have it go through an iterative process with a Proc Sort and a Proc Means for each V variable.
Any ideas?
Maybe you should test it. I think the OP wants 1 ways for many CLASS variables that should not "exceed" the limits maybe. Otherwise just transpose and do the means by _NAME_ and level.
Maybe this will work.
With "hundreds" of V variables the combinations will likely exceed the combinations that Proc Summary or means will handle. You may want to examine the CLASSDATA= procedure option to identify the the combinations of V variables that you want output. It may still take a couple of passes through the data depending on the numbers of combinations you request but you would only need to change the Classdata data set and output set.
Maybe you should test it. I think the OP wants 1 ways for many CLASS variables that should not "exceed" the limits maybe. Otherwise just transpose and do the means by _NAME_ and level.
Thanks to you both for your helpful suggestions. As it turns out, the data comes out of the query in a different format (untransposed), then it is transposed into the format described in my question above for some other additional analyses. So it made by far the most sense to simply do the Proc means by _NAME_ and level as a separate step before the data was transposed the first time.
Thanks again for the advice...
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.