- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to get some descriptive stats for each group using PROC MEANS. I have 2 options for grouping: BY or CLASS
I sort of understand that with CLASS, it computes the figures for different combinations of the variable in the CLASS statement and I need to choose the correct _TYPE_ to achieve what I want. I think I chose the right value for _TYPE_ but the end results are different between 2 options.
My code are :
*option 1;
proc means noprint data=input_data;
class A B C D E;
output out=temp1 sum(X1)=X1;
run;
*generally, choose the _TYPE_ that equals (2^x - 1) where x is the number of variable in the CLASS statement;
data temp1; set temp1; if _TYPE_= 31;
*option 2;
proc means noprint data=input_data;
by A B C D E;
output out=temp2 sum(X1)=X1;
run;
Shouldn't temp1 and temp2 be similar? at least the number of observations should be the same because they should have the same groups. However, my results are that temp2 has significantly more obs than temp1 (3 times). What are the reasons? what am I missing here ?Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No. temp1 and temp2 are not similar because with the Class Statement, SAS first considers the entire data set as a group, then each individual group. This doesn't happen with the By Statement. Therefore, you can not rely on the _TYPE_ variable in the output data sets either. With the Class Statement, _TYPE_=0 for considering the entire data set. See the code below for a brief example
proc means data=sashelp.iris;
class species;
output out=test1;
run;
proc means data=sashelp.iris;
by species;
output out=test2;
run;
Also, see the article The difference between CLASS statements and BY statements in SAS for a nice comparison of the two statements.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I got that with CLASS statement, SAS computes the figures for different group such as when _TYPE_=0 then it it across the whole sample. That is why I choose only _TYPE_ with value of 31 (=2^5-1 where 5 is the number of class variable) because I only want the statistics for groups by the variables in the CLASS statement. Wouldn't this be the same with using BY ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you have to use the missing option in your class statement to compare the two. Use a forward slash and add missing like below
class A B C D E / missing
Also, you can subset directly in the Means Procedure using the (Where=( )) data set option.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try adding / MISSING to your CLASS statement.Missing values will form valid BY groups but will be ignored as class levels unless you specify option MISSING.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@somebody wrote:
I am trying to get some descriptive stats for each group using PROC MEANS. I have 2 options for grouping: BY or CLASS
I sort of understand that with CLASS, it computes the figures for different combinations of the variable in the CLASS statement and I need to choose the correct _TYPE_ to achieve what I want. I think I chose the right value for _TYPE_ but the end results are different between 2 options.
My code are :
*option 1; proc means noprint data=input_data; class A B C D E; output out=temp1 sum(X1)=X1; run; *generally, choose the _TYPE_ that equals (2^x - 1) where x is the number of variable in the CLASS statement; data temp1; set temp1; if _TYPE_= 31; *option 2; proc means noprint data=input_data; by A B C D E; output out=temp2 sum(X1)=X1; run;
Shouldn't temp1 and temp2 be similar? at least the number of observations should be the same because they should have the same groups. However, my results are that temp2 has significantly more obs than temp1 (3 times). What are the reasons? what am I missing here ?Thanks
Yes, there are some situations — some, not all situations — where CLASS and BY produce the same results.
And there is much more power when you use the CLASS command. First, you don't have to have the data sorted when you use the CLASS command. But when you use the CLASS command, you can also use the very powerful WAYS and TYPES commands in the same PROC call. So, for example, if you wanted the sum for each level of A; and the sum for each level of B, and so on, then this can't be done via the BY command; but this is easily done via:
proc means noprint data=have;
class a b c d e;
ways 1;
var x1;
output out=want sum=sum_x1;
run;
So by using the WAYS and TYPES command with the CLASS statement, you can get lots of different types of analyses that are not possible with the BY statement. https://documentation.sas.com/?docsetId=proc&docsetTarget=p0f0fjpjeuco4gn1ri963f683mi4.htm&docsetVer...
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
First is you want only the summary with all class variables contributing then add the NWAY option to the PROC MEANS/SUMMARY statement. That will make the proc produce less output and have to do less work.
Second if the class variables have formats attached then PROC MEANS will use the formatted values when making the groupings. BY group processing uses the raw values.
Third watch out for missing values of the class variables.
Try this and see if you get the same results.
proc summary data=input_data nway missing;
class A B C D E;
output out=temp1 sum(X1)=X1;
format A B C D E ;
run;
proc summary data=input_data;
by A B C D E;
output out=temp2 sum(X1)=X1;
run;
For the BY statement to work the input dataset must be sorted. For the CLASS statement to work the number of combinations must be small enough that the PROC can store all of the combinations in memory.