Hello and thank you for helping. I am trying to figure out a way to use the AVG aggregate function in PROC SQL to average across a subset of the data, without specifying this subset using WHERE or HAVING statements. This is because I would like to calculate multiple different averages across different subsets within the data, all within the same select statement, creating a different variable name for each different average. I have included an illustrative example of what I am attempting to do below.
Consider the following data set:
DATA data;
INPUT id c b;
DATALINES;
1 1 1
17 0 1
33 0 0
49 0 0
65 1 0
81 1 1
2 1 0
18 0 1
34 0 0
50 1 0
20 1 1
39 0 0
;
PROC PRINT; RUN;
Using only one PROC SQL step (i.e. a single select statement), I would like to create a data set which contains only one observation and two variables:
So my output table should look like this:
| Obs | Mean_B_C0 | Mean_B_C1 | 
| 1 | .33333 | .5 | 
I was trying to use this code but it doesn't work. The reason I was trying to use this code is that it works; if I use the SUM aggregate function rather than the AVG aggregate function.
PROC SQL;
CREATE TABLE summary AS
SELECT
AVG((c=0)*B) AS Mean_B_C0,
AVG((c=1)*B) AS Mean_B_C1
FROM data;
QUIT;
Can anyone help me? This code is part of a program that runs iteratively and I have to calculate values in multiple different subsets. While I know how to do this using multiple PROC SQL steps, I would prefer not to since it slows down the speed of my program. Just to note, I also tried using CASE WHEN statements, but it ends up returning two rows in my output table, with missing values for when the case statement was not satisfied.
The average function won't work properly because the denominator won't be calculated properly, it will be the total number of observations.
Proc means would work nicely - or you can use sum as you've indicated.
PROC SQL;
CREATE TABLE summary AS
SELECT
sum((c=0)*B)/sum(c=0) AS Mean_B_C0,
sum((c=1)*B)/sum(c=1) AS Mean_B_C1
FROM data;
QUIT;
proc print data=summary;
run;
The average function won't work properly because the denominator won't be calculated properly, it will be the total number of observations.
Proc means would work nicely - or you can use sum as you've indicated.
PROC SQL;
CREATE TABLE summary AS
SELECT
sum((c=0)*B)/sum(c=0) AS Mean_B_C0,
sum((c=1)*B)/sum(c=1) AS Mean_B_C1
FROM data;
QUIT;
proc print data=summary;
run;
Reeza,
Thank you. I suppose I should have thought of just using the sum function. This helps though because I think it clarifies what's actually happening with the PROC SQL AVG function.
Thank you again,
Mitch
CASE is meant for this.
PROC SQL;
CREATE TABLE summary AS
SELECT
AVG( case when (c=0) then b else (.) end ) AS Mean_B_C0,
AVG( case when (c=1) then b else (.) end ) AS Mean_B_C1
FROM data;
QUIT;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.
