If I want to calculate the mean of a subgroup (for example I want to calculate the mean cholesterol of White men and White women) and I have variables for cholesterol, race, and sex, how would I combine the variables I need in order to get the averages I am looking for?
You can use PROC SUMMARY:
proc summary data=have;
class race sex;
var cholesterol;
output
out=want
avg()=
;
run;
You can use PROC SUMMARY:
proc summary data=have;
class race sex;
var cholesterol;
output
out=want
avg()=
;
run;
I actually tried a different way although I am not 100% sure if it's correct
Proc means data=data;
var cholesterol;
class sex;
where race= 2;
run;
Would this way be still sufficient?
You can apply a WHERE to race instead of having it in the CLASS, but the rest of the code should be same.
Typically I would use Proc Summary to create a data set if that is the need.
Proc summary data=have; class sex race; var cholesterol; output out=work.summary mean=; run;
The resulting data will have summary of 1) overall 2)each level of sex, 3)each level of race and 4) each combination of sex and race that appears in the data with the mean for each group. There is a variable _type_ that can be used to select which "group" you want for other processing. If you had other variables that you want to summarize, such as weight, blood pressure or what have you, add those variables to the VAR statement. If you want more statistics then add them such as min= max= std= /autoname. The Autoname option appends the statistics abbreviation to the variable name so you can tell the values apart. If only one statistic is used then the above syntax uses the existing name and it is up to you to know that it is a mean value.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.