Hi ,
I have a data set that has the ID, Business,LOB,ACTSIZ. like below example
data DATA;
length id $ 5 Business $ 2 LOB $ 3;
input id $ Business LOB $ ACTSIZ;
datalines;
12340 01 RTY 23
12341 20 TYR 0
12341 30 1UY 8
12343 30 CBP 8
12343 30 MTS 8
12343 30 HPL 8
12343 45 HFI 6
12343 45 HFS 6
45673 01 RTY 40
45673 30 IUY 24
45673 30 CBP 24
45673 30 IUY 24
45673 30 CBP 24
;
run;
I was summing ACTSIZ column group by id to get the totalsize and output looks like below ,which is wrong (seprate each id by line for easy readability)::
id Business LOB ACTSIZ Totalsize
12340 01 RTY 23 23
12341 20 TYR 0 8
12341 30 1UY 8 8
12343 30 CBP 8 36
12343 30 MTS 8 36
12343 30 HPL 8 36
12343 45 HFI 6 36
12343 45 HFS 6 36
45673 01 RTY 40 136
45673 30 IUY 24 136
45673 30 CBP 24 136
45673 30 IUY 24 136
45673 30 CBP 24 136
If the business id and actual size repeat more than once (LOB will be different) then count the actual size one time toward total size. Business want the output like below (seprate each id by line for easy readability):
id Business LOB ACTSIZ Totalsize
12340 01 RTY 23 23
12341 20 TYR 0 8
12341 30 1UY 8 8
12343 30 CBP 8 14
12343 30 MTS 8 14
12343 30 HPL 8 14
12343 45 HFI 6 14
12343 45 HFS 6 14
45673 01 RTY 40 64
45673 30 IUY 24 64
45673 30 CBP 24 64
45673 30 IUY 24 64
45673 30 CBP 24 64
Note: actual dataset has 150 columns and we need all those column in the output, so we can't use nodupkey option.
Thank you so much for your help.
-Sudha.
Like this?
data HAVE;
length ID $ 5 BUSINESS $ 2 LOB $ 3;
input ID $ BUSINESS LOB $ ACTSIZ;
datalines;
12340 01 RTY 23
12341 20 TYR 0
12341 30 1UY 8
12343 30 CBP 8
12343 30 MTS 8
12343 30 HPL 8
12343 45 HFI 6
12343 45 HFS 6
45673 01 RTY 40
45673 30 IUY 24
45673 30 CBP 24
45673 30 IUY 24
45673 30 CBP 24
run;
data _V/view=_V;
set HAVE;
by ID BUSINESS;
VALUE=( first.BUSINESS * ACTSIZ);
run;
proc sql;
select a.*, SUM
from _V a
,(select ID, sum(VALUE) as SUM from _V group by 1) b
where a.ID=b.ID ;
run;
ID | BUSINESS | LOB | ACTSIZ | ACTSIZ |
---|---|---|---|---|
12340 | 01 | RTY | 23 | 23 |
12341 | 20 | TYR | 0 | 8 |
12341 | 30 | 1UY | 8 | 8 |
12343 | 30 | CBP | 8 | 14 |
12343 | 30 | MTS | 8 | 14 |
12343 | 30 | HPL | 8 | 14 |
12343 | 45 | HFI | 6 | 14 |
12343 | 45 | HFS | 6 | 14 |
45673 | 01 | RTY | 40 | 64 |
45673 | 30 | IUY | 24 | 64 |
45673 | 30 | CBP | 24 | 64 |
45673 | 30 | IUY | 24 | 64 |
45673 | 30 | CBP | 24 | 64 |
Try this....
data WANT;
set HAVE;
group=catx('-',ID, BUSINESS, ACTSIZ);
grouplag=lag(group);
if (group ^= grouplag) or _n_=1 then do;
totalsize=actsiz;
is_this_what_you_want='Y';
end;
run;
Hope this helps.
Like this?
data HAVE;
length ID $ 5 BUSINESS $ 2 LOB $ 3;
input ID $ BUSINESS LOB $ ACTSIZ;
datalines;
12340 01 RTY 23
12341 20 TYR 0
12341 30 1UY 8
12343 30 CBP 8
12343 30 MTS 8
12343 30 HPL 8
12343 45 HFI 6
12343 45 HFS 6
45673 01 RTY 40
45673 30 IUY 24
45673 30 CBP 24
45673 30 IUY 24
45673 30 CBP 24
run;
data _V/view=_V;
set HAVE;
by ID BUSINESS;
VALUE=( first.BUSINESS * ACTSIZ);
run;
proc sql;
select a.*, SUM
from _V a
,(select ID, sum(VALUE) as SUM from _V group by 1) b
where a.ID=b.ID ;
run;
ID | BUSINESS | LOB | ACTSIZ | ACTSIZ |
---|---|---|---|---|
12340 | 01 | RTY | 23 | 23 |
12341 | 20 | TYR | 0 | 8 |
12341 | 30 | 1UY | 8 | 8 |
12343 | 30 | CBP | 8 | 14 |
12343 | 30 | MTS | 8 | 14 |
12343 | 30 | HPL | 8 | 14 |
12343 | 45 | HFI | 6 | 14 |
12343 | 45 | HFS | 6 | 14 |
45673 | 01 | RTY | 40 | 64 |
45673 | 30 | IUY | 24 | 64 |
45673 | 30 | CBP | 24 | 64 |
45673 | 30 | IUY | 24 | 64 |
45673 | 30 | CBP | 24 | 64 |
data HAVE; length ID $ 5 BUSINESS $ 2 LOB $ 3; input ID $ BUSINESS LOB $ ACTSIZ; datalines; 12340 01 RTY 23 12341 20 TYR 0 12341 30 1UY 8 12343 30 CBP 8 12343 30 MTS 8 12343 30 HPL 8 12343 45 HFI 6 12343 45 HFS 6 45673 01 RTY 40 45673 30 IUY 24 45673 30 CBP 24 45673 30 IUY 24 45673 30 CBP 24 ; run; proc sql; select * from (select * from have ) as a left join (select id,sum(ACTSIZ) as sum_ACTSIZ from (select distinct id,ACTSIZ from have) group by id) as b on a.id=b.id; quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.