BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jtrousd
Calcite | Level 5

Hey everyone. I'm having an issue with the sum() function in SAS. I'd like to know why the following two codes result in different output:

PROC SQL NOPRINT;

  CREATE TABLE cap1 AS

    SELECT var1                                                ,

           var2                                            ,

           var3                                             ,

           sum(var4) as sum_var4 ,

           sum(var5)  as sum_var5,

           sum(var6)    as sum_var6

    FROM   work.table1

    WHERE  var7 > 0

  GROUP BY var1, var2, var3;

QUIT;

DATA cap;

  SET work.table1(KEEP=var1 var2 var3 var4 var5 var6 var7);

  BY  var1 var2 var3;

  WHERE var7 > 0;

  sum_var4 = sum(var4);

  sum_var5 = sum(var5);

  sum_var6 = sum(var6);

RUN;

Not only would I like to know why it's different, but also which changes need to be made to the DATA step in order to duplicate the SQL output.

Many thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

There are two SUM() functions. The normal SAS function will sum one or more values within a single obsversion:   s1=sum(a,b,c).  The SQL aggregation function SUM() will sum one variable (or expression) across multiple observations.

To summarize across observations without using PROC SQl you should use PROC SUMMARY.  Use the CLASS statement to replicate your GROUP BY clause.

proc summary data=table1 nway ;

  where var7> 0;

  class var1 var2 var3 ;

  var var4 var5 var6;

  outout out=cap1 sum=sum_var4 sum_var5 sum_var6 ;

run;

To sum across observations in a DATA step you can use the sum statement (varname + expression; ) rather than the SUM() function.  Note that the sum statement will automatically retain the values of the target variable across iterations of the data step.

To recreate your GROUP BY clause you will need to sort by your grouping variables first and use BY and FIRST./LAST. processing.

proc sort data=table1 out=cap1 ;

  by var1 - var3;

run;

data cap1 ;

  set cap1 ;

  by var1 - var3 ;

  if first.var3 then call missing(sum_var4,sum_var5,sum_var6);

  sum_var4 + var4;

  sum_var5 + var5;

  sum_var6 + var6 ;

  if last.var3 ;

run;

View solution in original post

3 REPLIES 3
Reeza
Super User

sum in a datastep goes over a ROW

sum in SQL is for a COLUMN

So very different results.

How to replicate it? Don't. Use proc means or summary or some other logic instead.

If you HAVE to then you need to retain it and add it as you go along.

Untested:

DATA cap;

  SET work.table1(KEEP=var1 var2 var3 var4 var5 var6 var7);

  BY  var1 var2 var3;

  WHERE var7 > 0;

retain sum_var4 sum_var5 sum_var6;

if first.var3 then do;

     sum_var4=0; sum_var5=0; sum_var6=0;

end;

  sum_var4 = sum(var4, sum_var4);

  sum_var5 = sum(var5, sum_var5);

  sum_var6 = sum(var6, sum_var6);

RUN;

proc means data=have1 noprint;

by var1 var2 var3;

output out=cap sum(var4)=sum_var4 sum(var5)=sum_var5 sum(var6)=sum_var6;

run;

Tom
Super User Tom
Super User

There are two SUM() functions. The normal SAS function will sum one or more values within a single obsversion:   s1=sum(a,b,c).  The SQL aggregation function SUM() will sum one variable (or expression) across multiple observations.

To summarize across observations without using PROC SQl you should use PROC SUMMARY.  Use the CLASS statement to replicate your GROUP BY clause.

proc summary data=table1 nway ;

  where var7> 0;

  class var1 var2 var3 ;

  var var4 var5 var6;

  outout out=cap1 sum=sum_var4 sum_var5 sum_var6 ;

run;

To sum across observations in a DATA step you can use the sum statement (varname + expression; ) rather than the SUM() function.  Note that the sum statement will automatically retain the values of the target variable across iterations of the data step.

To recreate your GROUP BY clause you will need to sort by your grouping variables first and use BY and FIRST./LAST. processing.

proc sort data=table1 out=cap1 ;

  by var1 - var3;

run;

data cap1 ;

  set cap1 ;

  by var1 - var3 ;

  if first.var3 then call missing(sum_var4,sum_var5,sum_var6);

  sum_var4 + var4;

  sum_var5 + var5;

  sum_var6 + var6 ;

  if last.var3 ;

run;

jtrousd
Calcite | Level 5

Thanks a lot guys! Both helpful! Wish I could mark both as correct.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 49006 views
  • 6 likes
  • 3 in conversation