Hi
for one of my problem, I would need to have total sum of the account
for example, I have a dataset like this:
ID Bill
1 25
1 35
1 25
2 11
2 23
4 2
4 1
4 3
Can we have BIll_sum, a third variable totalling by ID like this:
ID Bill Bill_sum
1 25 85
1 35 85
1 25 85
2 11 34
2 23 34
4 2 6
4 1 6
4 3 6
Thanks a lot
Hi,
try this
data have;
input iD Bill;
cards;
1 25
1 35
1 25
2 11
2 23
4 2
4 1
4 3
;
run;
proc sort data=have;
by id;
run;
data want;
set have;
by id;
retain sum 0 sum_;
if first.id then sum=bill;
else sum+bill;
if last.id then output;
run;
proc sort data=want(keep=id sum);
by id;
run;
data want_;
merge have want;
by id;
run;
Thanks,
Jagadish
Hi,
try this
data have;
input iD Bill;
cards;
1 25
1 35
1 25
2 11
2 23
4 2
4 1
4 3
;
run;
proc sort data=have;
by id;
run;
data want;
set have;
by id;
retain sum 0 sum_;
if first.id then sum=bill;
else sum+bill;
if last.id then output;
run;
proc sort data=want(keep=id sum);
by id;
run;
data want_;
merge have want;
by id;
run;
Thanks,
Jagadish
You don't need to sort data WANT, you did not change the sort order when you created it from sorted SAVE.
Thank you for the feedback.
You are perfectly right, we can avoid that step. I just mentioned it for a better understanding.
Thanks,
Jagadish
Using SQL, it can be solved in one step:
Proc sql;
create table want as
select id, bill, sum(bill) as bill_sum
from have
group by id
;
quit;
or proc sql can do this simply...
proc sql;
create table want as select
*, sum(Bill) as Bill_sum
from have
group by id;
quit;
Hi All
Yes, there are options like Proc sql and group by and also there are possibilities like calculating and merging back.
What I was looking for a possibility to have summation in a datastep something like advance lookup or arrays possibilites
Thanks
???
What is the requirement for that?
Anyway, if data is already sorted in id order, use retain together with first. and last. processing.
Hi , the problem is I have a huge dataset with more than a 100 million rows. calculating and merging back would not be very
efficient I think. Second option is proc sql. I have realized, it can be done with proc sql but what if I would like to sum with multiple criteria like I want to sum only positive observations of any column. that would filter the output and again, I need to merge back.
Hi ,
you can try this one also as it will also help you to get 2nd highest sum by category also:
DATA TEST;
INPUT ID SAL;
CARDS;
1 200
2 300
3 400
1 600
2 800
3 600
1 800
2 600
3 500
;
RUN;
PROC SORT DATA =TEST ;
BY ID;
RUN;
DATA TEST1;
SET TEST;
BY ID;
RETAIN TOT 0 CN 0 ;
IF FIRST.ID THEN TOT=SAL;
ELSE TOT=SUM(TOT,SAL);
IF FIRST.ID THEN CN=1;
ELSE CN=CN+1;
RUN;
PROC SORT DATA =TEST1 ;
BY ID;
RUN;
DATA TEST2;
SET TEST1 ;
BY ID ;
if last.id then output test2;
RENAME CN=CNT TOT=SUM_TOT;
RUN;
data test3(DROP=TOT CN);
MERGE test2 test1;
by id;
tot_avg=SUM_TOT/cnt;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.