SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Can PROC SQL AVG function be completed within different subsets in the same select statement?

Accepted Solution Solved
Reply
Contributor
Posts: 50
Accepted Solution

Can PROC SQL AVG function be completed within different subsets in the same select statement?

Hello and thank you for helping.  I am trying to figure out a way to use the AVG aggregate function in PROC SQL to average across a subset of the data, without specifying this subset using WHERE or HAVING statements. This is because I would like to calculate multiple different averages across different subsets within the data, all within the same select statement, creating a different variable name for each different average.  I have included an illustrative example of what I am attempting to do below.

Consider the following data set:

DATA data;

   INPUT id c b;

   DATALINES;

1  1  1

17  0  1

33  0  0

49  0  0

65  1  0

81  1  1

2   1  0

18  0  1

34  0  0

50  1  0

20  1  1

39  0  0

;

PROC PRINT; RUN;

Using only one PROC SQL step (i.e. a single select statement), I would like to create a data set which contains only one observation and two variables:

  1. AVG_B_C0 = the average value of B when c = 0
  2. AVG_B_C1 = the average value of B when c = 1

So my output table should look like this:

Obs

Mean_B_C0

Mean_B_C1

1

.33333

.5

I was trying to use this code but it doesn't work.  The reason I was trying to use this code is that it works;  if I use the SUM aggregate function rather than the AVG aggregate function.

PROC SQL;

     CREATE TABLE summary AS

     SELECT

          AVG((c=0)*B) AS Mean_B_C0,

          AVG((c=1)*B) AS Mean_B_C1

FROM data;

QUIT;

Can anyone help me?  This code is part of a program that runs iteratively and I have to calculate values in multiple different subsets. While I know how to do this using multiple PROC SQL steps, I would prefer not to since it slows down the speed of my program.  Just to note, I also tried using CASE WHEN statements, but it ends up returning two rows in my output table, with missing values for when the case statement was not satisfied.


Accepted Solutions
Solution
‎02-10-2015 04:08 PM
Super User
Posts: 19,860

Re: Can PROC SQL AVG function be completed within different subsets in the same select statement?

The average function won't work properly because the denominator won't be calculated properly, it will be the total number of observations.

Proc means would work nicely - or you can use sum as you've indicated.

PROC SQL;

     CREATE TABLE summary AS

     SELECT

          sum((c=0)*B)/sum(c=0) AS Mean_B_C0,

          sum((c=1)*B)/sum(c=1) AS Mean_B_C1

FROM data;

QUIT;

proc print data=summary;

run;

View solution in original post


All Replies
Solution
‎02-10-2015 04:08 PM
Super User
Posts: 19,860

Re: Can PROC SQL AVG function be completed within different subsets in the same select statement?

The average function won't work properly because the denominator won't be calculated properly, it will be the total number of observations.

Proc means would work nicely - or you can use sum as you've indicated.

PROC SQL;

     CREATE TABLE summary AS

     SELECT

          sum((c=0)*B)/sum(c=0) AS Mean_B_C0,

          sum((c=1)*B)/sum(c=1) AS Mean_B_C1

FROM data;

QUIT;

proc print data=summary;

run;

Contributor
Posts: 50

Re: Can PROC SQL AVG function be completed within different subsets in the same select statement?

Reeza,

Thank you.  I suppose I should have thought of just using the sum function.  This helps though because I think it clarifies what's actually happening with the PROC SQL AVG function.

Thank you again,

Mitch

Regular Contributor
Posts: 184

Re: Can PROC SQL AVG function be completed within different subsets in the same select statement?

CASE is meant for this.

PROC SQL;

     CREATE TABLE summary AS

     SELECT

          AVG( case when (c=0) then b else (.) end ) AS Mean_B_C0,

          AVG( case when (c=1) then b else (.) end ) AS Mean_B_C1

FROM data;

QUIT;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 2702 views
  • 1 like
  • 3 in conversation