Hi Again,
I already posted a question here similar to this. But this is something very trick for me which I didn't get the solution.
data have;
input a2 $ a1 $12. a b;
cards;
VA IME_PREM_ADJ 11 100
VA IME_PREM_ADJ 12 102
VA IME_PREM_NON 11 101
VA IME_PREM_ADJ 13 101
WVA IME_PREM_RES 11 105
VA IME_PREM_RES 3 102
VA IME_PREM_NON 2 200
VA IME_ANNUITIE 3 205
WVA IME_ANNUITIE 4 501
;
run;
proc sort data=have
out=haveSrt;
by a2 a1 descending a;
run;
This is a sample data with 4 columns and few observations, in my original data I have 20+ columns and crores of data.
what I need is, sum b column based on (b>=b), but the group by should be maintained, the issue is as there are many columns we cn't predict which column we have to use for first. for doing the sum. is there any ways to get this done?
The output should lokks like :
a2 | a1 | a | b | b_sum |
VA | IME_ANNUITIE | 3 | 205 | 205 |
VA | IME_PREM_ADJ | 13 | 101 | 303 |
VA | IME_PREM_ADJ | 12 | 102 | 202 |
VA | IME_PREM_ADJ | 11 | 100 | 100 |
VA | IME_PREM_NON | 11 | 101 | 301 |
VA | IME_PREM_NON | 2 | 200 | 200 |
VA | IME_PREM_RES | 3 | 102 | 102 |
WVA | IME_ANNUITIE | 4 | 501 | 501 |
WVA | IME_PREM_RES | 11 | 105 | 105 |
here taking an example of a2=VA and a1=IME_PREM_ADJ the column a has values 13,12,11 so, the first row of 13 will be b_sum=b(13)+b(12)+ b(11) because 13>=12 and 11, likewise for the row of 12, b_sum=b(12)+b(11).
But, Please make sure about the multiple columns in the by group.
Thanks,
Chithra
Keep the original order before summing:
data have;
input a2 $ a1 $12. a b;
cards;
VA IME_PREM_ADJ 11 100
VA IME_PREM_ADJ 12 102
VA IME_PREM_NON 11 101
VA IME_PREM_ADJ 13 101
WVA IME_PREM_RES 11 105
VA IME_PREM_RES 3 102
VA IME_PREM_NON 2 200
VA IME_ANNUITIE 3 205
WVA IME_ANNUITIE 4 501
;
run;
proc sort data=have;
by a2 a1 a;
run;
data want;
set have;
by a2 a1;
retain b_sum;
if first.a1
then b_sum = b;
else b_sum + b;
run;
proc sort data=want;
by a2 a1 descending a;
run;
proc print data=want noobs;
run;
Result:
a2 a1 a b b_sum VA IME_ANNUITIE 3 205 205 VA IME_PREM_ADJ 13 101 303 VA IME_PREM_ADJ 12 102 202 VA IME_PREM_ADJ 11 100 100 VA IME_PREM_NON 11 101 301 VA IME_PREM_NON 2 200 200 VA IME_PREM_RES 3 102 102 WVA IME_ANNUITIE 4 501 501 WVA IME_PREM_RES 11 105 105
matches your expected output.
Keep the original order before summing:
data have;
input a2 $ a1 $12. a b;
cards;
VA IME_PREM_ADJ 11 100
VA IME_PREM_ADJ 12 102
VA IME_PREM_NON 11 101
VA IME_PREM_ADJ 13 101
WVA IME_PREM_RES 11 105
VA IME_PREM_RES 3 102
VA IME_PREM_NON 2 200
VA IME_ANNUITIE 3 205
WVA IME_ANNUITIE 4 501
;
run;
proc sort data=have;
by a2 a1 a;
run;
data want;
set have;
by a2 a1;
retain b_sum;
if first.a1
then b_sum = b;
else b_sum + b;
run;
proc sort data=want;
by a2 a1 descending a;
run;
proc print data=want noobs;
run;
Result:
a2 a1 a b b_sum VA IME_ANNUITIE 3 205 205 VA IME_PREM_ADJ 13 101 303 VA IME_PREM_ADJ 12 102 202 VA IME_PREM_ADJ 11 100 100 VA IME_PREM_NON 11 101 301 VA IME_PREM_NON 2 200 200 VA IME_PREM_RES 3 102 102 WVA IME_ANNUITIE 4 501 501 WVA IME_PREM_RES 11 105 105
matches your expected output.
Thank you.
I would try with a retain & lag...
data have;
input a2 $ a1 $12. a b;
cards;
VA IME_PREM_ADJ 11 100
VA IME_PREM_ADJ 12 102
VA IME_PREM_NON 11 101
VA IME_PREM_ADJ 13 101
WVA IME_PREM_RES 11 105
VA IME_PREM_RES 3 102
VA IME_PREM_NON 2 200
VA IME_ANNUITIE 3 205
WVA IME_ANNUITIE 4 501
;
run;
proc sql;
create table want as
select distinct a.*,sum(b.b) as b_sum
from have a, have b
where a.a2=b.a2 and a.a1=b.a1 and a.a>=b.a
group by a.a2,a.a1,a.a
order by a.a2,a.a1, a.a desc;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.