BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mconover
Quartz | Level 8

Hello and thank you for helping.  I am trying to figure out a way to use the AVG aggregate function in PROC SQL to average across a subset of the data, without specifying this subset using WHERE or HAVING statements. This is because I would like to calculate multiple different averages across different subsets within the data, all within the same select statement, creating a different variable name for each different average.  I have included an illustrative example of what I am attempting to do below.

Consider the following data set:

DATA data;

   INPUT id c b;

   DATALINES;

1  1  1

17  0  1

33  0  0

49  0  0

65  1  0

81  1  1

2   1  0

18  0  1

34  0  0

50  1  0

20  1  1

39  0  0

;

PROC PRINT; RUN;

Using only one PROC SQL step (i.e. a single select statement), I would like to create a data set which contains only one observation and two variables:

  1. AVG_B_C0 = the average value of B when c = 0
  2. AVG_B_C1 = the average value of B when c = 1

So my output table should look like this:

Obs

Mean_B_C0

Mean_B_C1

1

.33333

.5

I was trying to use this code but it doesn't work.  The reason I was trying to use this code is that it works;  if I use the SUM aggregate function rather than the AVG aggregate function.

PROC SQL;

     CREATE TABLE summary AS

     SELECT

          AVG((c=0)*B) AS Mean_B_C0,

          AVG((c=1)*B) AS Mean_B_C1

FROM data;

QUIT;

Can anyone help me?  This code is part of a program that runs iteratively and I have to calculate values in multiple different subsets. While I know how to do this using multiple PROC SQL steps, I would prefer not to since it slows down the speed of my program.  Just to note, I also tried using CASE WHEN statements, but it ends up returning two rows in my output table, with missing values for when the case statement was not satisfied.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

The average function won't work properly because the denominator won't be calculated properly, it will be the total number of observations.

Proc means would work nicely - or you can use sum as you've indicated.

PROC SQL;

     CREATE TABLE summary AS

     SELECT

          sum((c=0)*B)/sum(c=0) AS Mean_B_C0,

          sum((c=1)*B)/sum(c=1) AS Mean_B_C1

FROM data;

QUIT;

proc print data=summary;

run;

View solution in original post

3 REPLIES 3
Reeza
Super User

The average function won't work properly because the denominator won't be calculated properly, it will be the total number of observations.

Proc means would work nicely - or you can use sum as you've indicated.

PROC SQL;

     CREATE TABLE summary AS

     SELECT

          sum((c=0)*B)/sum(c=0) AS Mean_B_C0,

          sum((c=1)*B)/sum(c=1) AS Mean_B_C1

FROM data;

QUIT;

proc print data=summary;

run;

mconover
Quartz | Level 8

Reeza,

Thank you.  I suppose I should have thought of just using the sum function.  This helps though because I think it clarifies what's actually happening with the PROC SQL AVG function.

Thank you again,

Mitch

Howles
Quartz | Level 8

CASE is meant for this.

PROC SQL;

     CREATE TABLE summary AS

     SELECT

          AVG( case when (c=0) then b else (.) end ) AS Mean_B_C0,

          AVG( case when (c=1) then b else (.) end ) AS Mean_B_C1

FROM data;

QUIT;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 13921 views
  • 1 like
  • 3 in conversation