I am trying to collapse my data from the individual person level to the state-year level and get the sum for one variable and the mean for another. How do I do this in SAS? See below for a simplified example of what I am trying to do:
What I have:
|
Year |
State |
Dog Owner Dummy |
Dogs Owned |
|
2019 |
MA |
0 |
0 |
|
2019 |
MA |
1 |
1 |
|
2019 |
CA |
1 |
2 |
|
2019 |
CA |
1 |
1 |
|
2020 |
MA |
1 |
2 |
|
2020 |
MA |
0 |
0 |
|
2020 |
CA |
0 |
0 |
|
2020 |
CA |
0 |
0 |
What I want:
|
Year |
State |
Dog Owner Count |
Avg # Dogs Owned |
|
2019 |
MA |
1 |
0.5 |
|
2019 |
CA |
2 |
1.5 |
|
2020 |
MA |
1 |
1 |
|
2020 |
CA |
0 |
0 |
Look at PROC SUMMARY (or its alias PROC MEANS).
A relevant example can be found at Simple Proc Summary Example in SAS, from SASNRD
Look at PROC SUMMARY (or its alias PROC MEANS).
A relevant example can be found at Simple Proc Summary Example in SAS, from SASNRD
Ahh! Thank you. This is helpful. However, what if I have more than one variable. Say I have two variables that I want to get the mean for and one I want to get the total sum for? How do I specify which action I want to take for which variables?
@raivester wrote:
Ahh! Thank you. This is helpful. However, what if I have more than one variable. Say I have two variables that I want to get the mean for and one I want to get the total sum for? How do I specify which action I want to take for which variables?
Proc means / summary will provide many statistics for many variables.
Proc summary data= have;
class somegrouping variables;
var <list your variables to request stats here>;
output out=want sum(<list variables that you want summed>) = (list of variable names for output)
mean(<list of variabls you want the mean for>) = (list of variable names for output)
std (<list of variables you want the std for>) = ...
run;
If you don't want to type a bunch of output variable names you can use the / autoname option after requesting the variables. The statistic will be added a suffix to the variable name.
Example:
Proc summary data=sashelp.class nway;
class sex;
var age height weight;
output out=summaryexample
sum (age height)=
mean(weight)=
median(age height)=
std= /autoname;
run;
Note that as the STD statistic did not include any variables that all the variables on the VAR statement used.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.