BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jayaditya
Calcite | Level 5

I have a dataset as below:

col1 col2   col3  col4  col5  date            col6     col7

  10    11     1     1       2      04jul2021     11         100
  10     11    2     1      2       05jul2021     12            50
  10     11    2     1      2       05jul2021     13         100

  11      1     1      1     1       01Jul2021     14         200

   11     1      1     1     1       01jul12021    14          200

I want to sum the value in the col7 by col1, col2,col3,col4,col5,date,col6  and the resultant should be as below:

col1 col2   col3  col4  col5  date            col6     col7

  10    11     1     1       2      04jul2021     11         100
  10     11    2     1      2       05jul2021     12            50
  10     11    2     1      2       05jul2021     13         100

  11      1     1      1     1       01jul2021      14         400
if in above example as Col6 is different in the line 2 and 3 so col7 should not be summed(so two lines). Where as line 4 and 5 are same so it shoul be summed and in one line 
 How can I achieve in SAS Base?

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Maxim 2: Read the Log.

 73         data new;
 74         set input (rename = (col7=AMT_ORIG));
 75         by col1,col2,col3,col4,col5,col6;
                   _
                   22
                   76
 ERROR 22-322: Syntaxfehler, erwartet wird eines der folgenden: ein Name, -, :, ;, DESCENDING, GROUPFORMAT, NOTSORTED, _ALL_, 
               _CHARACTER_, _CHAR_, _NUMERIC_.  
 
 ERROR 76-322: Syntax error, statement will be ignored.
 
 76         length amt 8;
 77         if(first.col4) then amt = coalesce(amt_orig,0);
 78         else amt = sum(amt_orig,amt);
 79         run;
 
 NOTE: The SAS System stopped processing this step because of errors.
 WARNING: The data set WORK.NEW may be incomplete.  When this step was stopped there were 0 observations and 9 variables.
 WARNING: Datei WORK.NEW wurde nicht ersetzt, da da dieser Schritt angehalten wurde.

The position of the ERROR marker alerts you to the syntactically wrong element, which is the comma. Commas are not allowed in BY statements. You can also find this out by reading the documentation (Maxim 1) of the BY Statement.

@ChrisNZ 's code works and produces the intended result:

col1	col2	col3	col4	col5	date	col6	COL7
10	11	1	1	2	2021-07-04	11	100
10	11	2	1	2	2021-07-05	12	50
10	11	2	1	2	2021-07-05	13	100
11	1	1	1	1	2021-07-01	14	400

Your code needs to be fixed and expanded with a RETAIN statement, or the value of amt will not be kept across observations; you also need a subsetting IF to only keep the last observation of a group, and use the correct FIRST. and LAST. variables:

data new;
set input (rename = (col7=AMT_ORIG));
by col1 col2 col3 col4 col5 date col6;
retain amt;
length amt 8;
if first.col6 then amt = coalesce(amt_orig,0);
else amt = sum(amt_orig,amt);
if last.col6;
run;

proc print data=new noobs;
run;

Result:

col1	col2	col3	col4	col5	date	col6	AMT_ORIG	amt
10	11	1	1	2	2021-07-04	11	100	100
10	11	2	1	2	2021-07-05	12	50	50
10	11	2	1	2	2021-07-05	13	100	100
11	1	1	1	1	2021-07-01	14	200	400

But you can refine your code by using a SUM statement, which implies the retain. And you do not need to rename col7, as you won't need it after summing:

data new;
set input;
by col1 col2 col3 col4 col5 date col6;
length amt 8;
if first.col6
then amt = col7;
else amt + col7;
if last.col6;
drop col7;
run;

proc print data=new noobs;
run;

Result:

col1	col2	col3	col4	col5	date	col6	amt
10	11	1	1	2	2021-07-04	11	100
10	11	2	1	2	2021-07-05	12	50
10	11	2	1	2	2021-07-05	13	100
11	1	1	1	1	2021-07-01	14	400

But why don't you use PROC SUMMARY, which is built for this:

proc summary data=input nway;
class col1 col2 col3 col4 col5 date col6;
output
  out=want (drop=_:)
  sum(col7)=amt
;
run;

proc print data=want noobs;
run;

Result:

col1	col2	col3	col4	col5	date	col6	amt
10	11	1	1	2	2021-07-04	11	100
10	11	2	1	2	2021-07-05	12	50
10	11	2	1	2	2021-07-05	13	100
11	1	1	1	1	2021-07-01	14	400

 

 

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

Like this (untested as no usable data was provided)?

proc sql;
  select COL1, COL2, COL3, COL4, COL5, DATE, COL6, sum(COL7) as COL7
  from TABLE 
  group by COL1,COL2,COL3, COL4, COL5, DATE, COL6;
Jayaditya
Calcite | Level 5

Hi Kris,

 

It does not give me the result I am looking for. I attached the input dataset as CSV. I am new to SAS. I tried as below.

 

data new;

set input (rename = (col7=AMT_ORIG));

by col1,col2,col3,col4,col5,col6;

length amt 8;

if(first.col4) then amt = coalesce(amt_orig,0);

else amt = sum(amt_orig,amt);

run;

 

Please help me with this.

ChrisNZ
Tourmaline | Level 20

> It does not give me the result I am looking for.

How? What's not as you want in the result? It matches your description as I understand it.

Please provide some data as code, either a data step or SQL.

Kurt_Bremser
Super User

Maxim 2: Read the Log.

 73         data new;
 74         set input (rename = (col7=AMT_ORIG));
 75         by col1,col2,col3,col4,col5,col6;
                   _
                   22
                   76
 ERROR 22-322: Syntaxfehler, erwartet wird eines der folgenden: ein Name, -, :, ;, DESCENDING, GROUPFORMAT, NOTSORTED, _ALL_, 
               _CHARACTER_, _CHAR_, _NUMERIC_.  
 
 ERROR 76-322: Syntax error, statement will be ignored.
 
 76         length amt 8;
 77         if(first.col4) then amt = coalesce(amt_orig,0);
 78         else amt = sum(amt_orig,amt);
 79         run;
 
 NOTE: The SAS System stopped processing this step because of errors.
 WARNING: The data set WORK.NEW may be incomplete.  When this step was stopped there were 0 observations and 9 variables.
 WARNING: Datei WORK.NEW wurde nicht ersetzt, da da dieser Schritt angehalten wurde.

The position of the ERROR marker alerts you to the syntactically wrong element, which is the comma. Commas are not allowed in BY statements. You can also find this out by reading the documentation (Maxim 1) of the BY Statement.

@ChrisNZ 's code works and produces the intended result:

col1	col2	col3	col4	col5	date	col6	COL7
10	11	1	1	2	2021-07-04	11	100
10	11	2	1	2	2021-07-05	12	50
10	11	2	1	2	2021-07-05	13	100
11	1	1	1	1	2021-07-01	14	400

Your code needs to be fixed and expanded with a RETAIN statement, or the value of amt will not be kept across observations; you also need a subsetting IF to only keep the last observation of a group, and use the correct FIRST. and LAST. variables:

data new;
set input (rename = (col7=AMT_ORIG));
by col1 col2 col3 col4 col5 date col6;
retain amt;
length amt 8;
if first.col6 then amt = coalesce(amt_orig,0);
else amt = sum(amt_orig,amt);
if last.col6;
run;

proc print data=new noobs;
run;

Result:

col1	col2	col3	col4	col5	date	col6	AMT_ORIG	amt
10	11	1	1	2	2021-07-04	11	100	100
10	11	2	1	2	2021-07-05	12	50	50
10	11	2	1	2	2021-07-05	13	100	100
11	1	1	1	1	2021-07-01	14	200	400

But you can refine your code by using a SUM statement, which implies the retain. And you do not need to rename col7, as you won't need it after summing:

data new;
set input;
by col1 col2 col3 col4 col5 date col6;
length amt 8;
if first.col6
then amt = col7;
else amt + col7;
if last.col6;
drop col7;
run;

proc print data=new noobs;
run;

Result:

col1	col2	col3	col4	col5	date	col6	amt
10	11	1	1	2	2021-07-04	11	100
10	11	2	1	2	2021-07-05	12	50
10	11	2	1	2	2021-07-05	13	100
11	1	1	1	1	2021-07-01	14	400

But why don't you use PROC SUMMARY, which is built for this:

proc summary data=input nway;
class col1 col2 col3 col4 col5 date col6;
output
  out=want (drop=_:)
  sum(col7)=amt
;
run;

proc print data=want noobs;
run;

Result:

col1	col2	col3	col4	col5	date	col6	amt
10	11	1	1	2	2021-07-04	11	100
10	11	2	1	2	2021-07-05	12	50
10	11	2	1	2	2021-07-05	13	100
11	1	1	1	1	2021-07-01	14	400

 

 

Jayaditya
Calcite | Level 5

@Kurt_Bremser: Thanks for the reply. 

 

If I also want to check on col3 as well. So can I use the condition as below:

 

Because first, I have to check on col3 and also on col6 and do the cumulative sum. 

 

Would you please help me with this? or is it impossible, and I have to perform this action in a separate step?

 

if first.col6 or first.col3 then amt = coalesce(amt_orig,0);
else amt = sum(amt_orig,amt);
if last.col6;
run;

 

 

Kurt_Bremser
Super User

If you want to sum on groups indexed by col1-col3, and also on groups indexed by col1-col6, you need two separate sum variables.

first.col3 implies first.col6, BTW.

Jayaditya
Calcite | Level 5

Thanks for clarifying. I was able to solve and proceed further.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1121 views
  • 0 likes
  • 3 in conversation