I have a dataset as below:
col1 col2 col3 col4 col5 date col6 col7
10 11 1 1 2 04jul2021 11 100
10 11 2 1 2 05jul2021 12 50
10 11 2 1 2 05jul2021 13 100
11 1 1 1 1 01Jul2021 14 200
11 1 1 1 1 01jul12021 14 200
I want to sum the value in the col7 by col1, col2,col3,col4,col5,date,col6 and the resultant should be as below:
col1 col2 col3 col4 col5 date col6 col7
10 11 1 1 2 04jul2021 11 100
10 11 2 1 2 05jul2021 12 50
10 11 2 1 2 05jul2021 13 100
11 1 1 1 1 01jul2021 14 400
if in above example as Col6 is different in the line 2 and 3 so col7 should not be summed(so two lines). Where as line 4 and 5 are same so it shoul be summed and in one line
How can I achieve in SAS Base?
Maxim 2: Read the Log.
73 data new; 74 set input (rename = (col7=AMT_ORIG)); 75 by col1,col2,col3,col4,col5,col6; _ 22 76 ERROR 22-322: Syntaxfehler, erwartet wird eines der folgenden: ein Name, -, :, ;, DESCENDING, GROUPFORMAT, NOTSORTED, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_. ERROR 76-322: Syntax error, statement will be ignored. 76 length amt 8; 77 if(first.col4) then amt = coalesce(amt_orig,0); 78 else amt = sum(amt_orig,amt); 79 run; NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.NEW may be incomplete. When this step was stopped there were 0 observations and 9 variables. WARNING: Datei WORK.NEW wurde nicht ersetzt, da da dieser Schritt angehalten wurde.
The position of the ERROR marker alerts you to the syntactically wrong element, which is the comma. Commas are not allowed in BY statements. You can also find this out by reading the documentation (Maxim 1) of the BY Statement.
@ChrisNZ 's code works and produces the intended result:
col1 col2 col3 col4 col5 date col6 COL7 10 11 1 1 2 2021-07-04 11 100 10 11 2 1 2 2021-07-05 12 50 10 11 2 1 2 2021-07-05 13 100 11 1 1 1 1 2021-07-01 14 400
Your code needs to be fixed and expanded with a RETAIN statement, or the value of amt will not be kept across observations; you also need a subsetting IF to only keep the last observation of a group, and use the correct FIRST. and LAST. variables:
data new;
set input (rename = (col7=AMT_ORIG));
by col1 col2 col3 col4 col5 date col6;
retain amt;
length amt 8;
if first.col6 then amt = coalesce(amt_orig,0);
else amt = sum(amt_orig,amt);
if last.col6;
run;
proc print data=new noobs;
run;
Result:
col1 col2 col3 col4 col5 date col6 AMT_ORIG amt 10 11 1 1 2 2021-07-04 11 100 100 10 11 2 1 2 2021-07-05 12 50 50 10 11 2 1 2 2021-07-05 13 100 100 11 1 1 1 1 2021-07-01 14 200 400
But you can refine your code by using a SUM statement, which implies the retain. And you do not need to rename col7, as you won't need it after summing:
data new;
set input;
by col1 col2 col3 col4 col5 date col6;
length amt 8;
if first.col6
then amt = col7;
else amt + col7;
if last.col6;
drop col7;
run;
proc print data=new noobs;
run;
Result:
col1 col2 col3 col4 col5 date col6 amt 10 11 1 1 2 2021-07-04 11 100 10 11 2 1 2 2021-07-05 12 50 10 11 2 1 2 2021-07-05 13 100 11 1 1 1 1 2021-07-01 14 400
But why don't you use PROC SUMMARY, which is built for this:
proc summary data=input nway;
class col1 col2 col3 col4 col5 date col6;
output
out=want (drop=_:)
sum(col7)=amt
;
run;
proc print data=want noobs;
run;
Result:
col1 col2 col3 col4 col5 date col6 amt 10 11 1 1 2 2021-07-04 11 100 10 11 2 1 2 2021-07-05 12 50 10 11 2 1 2 2021-07-05 13 100 11 1 1 1 1 2021-07-01 14 400
Like this (untested as no usable data was provided)?
proc sql;
select COL1, COL2, COL3, COL4, COL5, DATE, COL6, sum(COL7) as COL7
from TABLE
group by COL1,COL2,COL3, COL4, COL5, DATE, COL6;
Hi Kris,
It does not give me the result I am looking for. I attached the input dataset as CSV. I am new to SAS. I tried as below.
data new;
set input (rename = (col7=AMT_ORIG));
by col1,col2,col3,col4,col5,col6;
length amt 8;
if(first.col4) then amt = coalesce(amt_orig,0);
else amt = sum(amt_orig,amt);
run;
Please help me with this.
> It does not give me the result I am looking for.
How? What's not as you want in the result? It matches your description as I understand it.
Please provide some data as code, either a data step or SQL.
Maxim 2: Read the Log.
73 data new; 74 set input (rename = (col7=AMT_ORIG)); 75 by col1,col2,col3,col4,col5,col6; _ 22 76 ERROR 22-322: Syntaxfehler, erwartet wird eines der folgenden: ein Name, -, :, ;, DESCENDING, GROUPFORMAT, NOTSORTED, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_. ERROR 76-322: Syntax error, statement will be ignored. 76 length amt 8; 77 if(first.col4) then amt = coalesce(amt_orig,0); 78 else amt = sum(amt_orig,amt); 79 run; NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.NEW may be incomplete. When this step was stopped there were 0 observations and 9 variables. WARNING: Datei WORK.NEW wurde nicht ersetzt, da da dieser Schritt angehalten wurde.
The position of the ERROR marker alerts you to the syntactically wrong element, which is the comma. Commas are not allowed in BY statements. You can also find this out by reading the documentation (Maxim 1) of the BY Statement.
@ChrisNZ 's code works and produces the intended result:
col1 col2 col3 col4 col5 date col6 COL7 10 11 1 1 2 2021-07-04 11 100 10 11 2 1 2 2021-07-05 12 50 10 11 2 1 2 2021-07-05 13 100 11 1 1 1 1 2021-07-01 14 400
Your code needs to be fixed and expanded with a RETAIN statement, or the value of amt will not be kept across observations; you also need a subsetting IF to only keep the last observation of a group, and use the correct FIRST. and LAST. variables:
data new;
set input (rename = (col7=AMT_ORIG));
by col1 col2 col3 col4 col5 date col6;
retain amt;
length amt 8;
if first.col6 then amt = coalesce(amt_orig,0);
else amt = sum(amt_orig,amt);
if last.col6;
run;
proc print data=new noobs;
run;
Result:
col1 col2 col3 col4 col5 date col6 AMT_ORIG amt 10 11 1 1 2 2021-07-04 11 100 100 10 11 2 1 2 2021-07-05 12 50 50 10 11 2 1 2 2021-07-05 13 100 100 11 1 1 1 1 2021-07-01 14 200 400
But you can refine your code by using a SUM statement, which implies the retain. And you do not need to rename col7, as you won't need it after summing:
data new;
set input;
by col1 col2 col3 col4 col5 date col6;
length amt 8;
if first.col6
then amt = col7;
else amt + col7;
if last.col6;
drop col7;
run;
proc print data=new noobs;
run;
Result:
col1 col2 col3 col4 col5 date col6 amt 10 11 1 1 2 2021-07-04 11 100 10 11 2 1 2 2021-07-05 12 50 10 11 2 1 2 2021-07-05 13 100 11 1 1 1 1 2021-07-01 14 400
But why don't you use PROC SUMMARY, which is built for this:
proc summary data=input nway;
class col1 col2 col3 col4 col5 date col6;
output
out=want (drop=_:)
sum(col7)=amt
;
run;
proc print data=want noobs;
run;
Result:
col1 col2 col3 col4 col5 date col6 amt 10 11 1 1 2 2021-07-04 11 100 10 11 2 1 2 2021-07-05 12 50 10 11 2 1 2 2021-07-05 13 100 11 1 1 1 1 2021-07-01 14 400
@Kurt_Bremser: Thanks for the reply.
If I also want to check on col3 as well. So can I use the condition as below:
Because first, I have to check on col3 and also on col6 and do the cumulative sum.
Would you please help me with this? or is it impossible, and I have to perform this action in a separate step?
if first.col6 or first.col3 then amt = coalesce(amt_orig,0);
else amt = sum(amt_orig,amt);
if last.col6;
run;
If you want to sum on groups indexed by col1-col3, and also on groups indexed by col1-col6, you need two separate sum variables.
first.col3 implies first.col6, BTW.
Thanks for clarifying. I was able to solve and proceed further.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.