I'm using the sashelp.cars dataset here.
1) The following are 3 subgroup analyses using PROC MEANS. This isn't a lot of repetition, but for the purpose of learning, how can I best combine multiple subgroup analyses to minimize the amount of repetition in code?
PROC MEANS DATA = sashelp.cars MAXDEC=1 N mean Median Q1 Q3 Qrange;
Var MSRP;
WHERE Type = 'Sports';
RUN;
PROC MEANS DATA = sashelp.cars MAXDEC=1 N mean Median Q1 Q3 Qrange;
Var Invoice;
WHERE Origin = 'Europe';
RUN;
PROC MEANS DATA = sashelp.cars MAXDEC=1 N mean Median Q1 Q3 Qrange;
Var Horsepower;
WHERE DriveTrain = 'Rear';
RUN;
If I run the following I get different results. I think what's happening is the last WHERE (DriveTrain = 'Rear') is being used for all the Vars, which is not what I want. I want it to look like the previous except all together.
PROC MEANS DATA = sashelp.cars MAXDEC=1 N mean Median Q1 Q3 Qrange;
Var MSRP;
WHERE Type = 'Sports';
Var Invoice;
WHERE Origin = 'Europe';
Var Horsepower;
WHERE DriveTrain = 'Rear';
RUN;
2) Here are 3 subgroup analyses using PROC FREQ. Again, I would like to know how to combine multiple subgroup analyses to minimize the amount of repetition in code.
PROC FREQ DATA = sashelp.cars;
TABLES Type;
WHERE Origin = 'USA';
RUN;
PROC FREQ DATA = sashelp.cars;
TABLES Origin;
WHERE DriveTrain = 'Front';
RUN;
PROC FREQ DATA = sashelp.cars;
TABLES DriveTrain;
WHERE Horsepower > 200;
RUN;
If I run the following I get different results. Again, I think what's happening is the last WHERE (Horsepower > 200) is being used for all of them, which is not what I want. I want it to look like the previous except all together.
PROC FREQ DATA = sashelp.cars;
TABLES Type;
WHERE Origin = 'USA';
TABLES Origin;
WHERE DriveTrain = 'Front';
TABLES DriveTrain;
WHERE Horsepower > 200;
RUN;
Thanks,
PROC MEANS DATA = sashelp.cars noprint; class drivetrain type origin; ways 1; var Horsepower msrp invoice; output out=stats n= mean= median= q1= q3= qrange=/autoname; run;
The above is a one-step approach to the problem. It gives you all the statistics you want, plus others you didn't ask for, and is probably a more real-world situation where you want means for all three variables, medians for all three variables, etc. It uses the power of PROC MEANS to pass through the data once, instead of what you originally programmed, which has to pass through the data three times; and if your example goes from needing three WHERE clauses to (for example) 20 WHERE clauses, you can see that it would be a huge amount of typing, but the method using PROC MEANS simply requires you to add more variables to the VAR statement, that's all.
By the way, there are many different forms of output from PROC MEANS, in case you want a report to include in a document, or in case you want a SAS data set (as I have shown).
I recommend you learn to use the power of PROC MEANS/PROC SUMMARY. It will help you a lot more in the long run, as I feel more real world problems are better handled this way rather than with specific WHERE statements to produce the results.
There can only be one active WHERE for the whole procedure. You need a separate procedure for every different subset.
PROC MEANS DATA = sashelp.cars noprint; class drivetrain type origin; ways 1; var Horsepower msrp invoice; output out=stats n= mean= median= q1= q3= qrange=/autoname; run;
The above is a one-step approach to the problem. It gives you all the statistics you want, plus others you didn't ask for, and is probably a more real-world situation where you want means for all three variables, medians for all three variables, etc. It uses the power of PROC MEANS to pass through the data once, instead of what you originally programmed, which has to pass through the data three times; and if your example goes from needing three WHERE clauses to (for example) 20 WHERE clauses, you can see that it would be a huge amount of typing, but the method using PROC MEANS simply requires you to add more variables to the VAR statement, that's all.
By the way, there are many different forms of output from PROC MEANS, in case you want a report to include in a document, or in case you want a SAS data set (as I have shown).
I recommend you learn to use the power of PROC MEANS/PROC SUMMARY. It will help you a lot more in the long run, as I feel more real world problems are better handled this way rather than with specific WHERE statements to produce the results.
Thank you for this. I will need to be doing a lot of descriptive statistics, so i'm definitely going to have to learn more about PROC MEANS and PROC SUMMARY.
Since the WHERE statement applies to the whole step you might need to learn how to programmatically generate the SAS code you want to run. That is what the macro processor is for.
For example you could create a macro that generates your PROC MEANS step. Define it to take a parameter you can use to build the WHERE statement.
%macro mymeans(where=);
PROC MEANS DATA = sashelp.cars MAXDEC=1 N mean Median Q1 Q3 Qrange;
var MSRP;
WHERE &where;
RUN;
%mend mymeans;
%mymeans(where=(Type = 'Sports'));
%mymeans(where=(Origin = 'Europe'));
%mymeans(where=(DriveTrain = 'Rear'));
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.