Solved: Re: distinct count values

cho16 · Posted 09-19-2017 08:29 PM

Hi ,

I have a data set that has the ID, Business,LOB,ACTSIZ. like below example

data DATA;

length id $ 5 Business $ 2 LOB $ 3;

input id $ Business LOB $ ACTSIZ;

datalines;

12340 01 RTY 23

12341 20 TYR 0

12341 30 1UY 8

12343 30 CBP 8

12343 30 MTS 8

12343 30 HPL 8

12343 45 HFI 6

12343 45 HFS 6

45673 01 RTY 40

45673 30 IUY 24

45673 30 CBP 24

45673 30 IUY 24

45673 30 CBP 24

;

run;

I was summing ACTSIZ column group by id to get the totalsize and output looks like below ,which is wrong (seprate each id by line for easy readability)::

id Business LOB ACTSIZ Totalsize

12340 01 RTY 23 23

12341 20 TYR 0 8

12341 30 1UY 8 8

12343 30 CBP 8 36

12343 30 MTS 8 36

12343 30 HPL 8 36

12343 45 HFI 6 36

12343 45 HFS 6 36

45673 01 RTY 40 136

45673 30 IUY 24 136

45673 30 CBP 24 136

45673 30 IUY 24 136

45673 30 CBP 24 136

If the business id and actual size repeat more than once (LOB will be different) then count the actual size one time toward total size. Business want the output like below (seprate each id by line for easy readability):

id Business LOB ACTSIZ Totalsize

12340 01 RTY 23 23

12341 20 TYR 0 8

12341 30 1UY 8 8

12343 30 CBP 8 14

12343 30 MTS 8 14

12343 30 HPL 8 14

12343 45 HFI 6 14

12343 45 HFS 6 14

45673 01 RTY 40 64

45673 30 IUY 24 64

45673 30 CBP 24 64

45673 30 IUY 24 64

45673 30 CBP 24 64

Note: actual dataset has 150 columns and we need all those column in the output, so we can't use nodupkey option.

Thank you so much for your help.

-Sudha.

ChrisNZ · Posted 09-19-2017 11:34 PM

Like this?

data HAVE;  
length ID $ 5 BUSINESS $ 2 LOB $ 3;
input ID $ BUSINESS LOB $ ACTSIZ;
datalines;
12340 01 RTY 23
12341 20 TYR 0
12341 30 1UY 8
12343 30 CBP 8
12343 30 MTS 8
12343 30 HPL 8
12343 45 HFI 6
12343 45 HFS 6
45673 01 RTY 40
45673 30 IUY 24
45673 30 CBP 24
45673 30 IUY 24
45673 30 CBP 24
run;
data _V/view=_V;
  set HAVE;
  by ID BUSINESS;
  VALUE=( first.BUSINESS * ACTSIZ);
run;
proc sql;
   select a.*, SUM
   from _V                                                   a
       ,(select ID, sum(VALUE) as SUM from _V group by 1)    b
   where a.ID=b.ID ;
run;

ID	BUSINESS	LOB	ACTSIZ	ACTSIZ
12340	01	RTY	23	23
12341	20	TYR	0	8
12341	30	1UY	8	8
12343	30	CBP	8	14
12343	30	MTS	8	14
12343	30	HPL	8	14
12343	45	HFI	6	14
12343	45	HFS	6	14
45673	01	RTY	40	64
45673	30	IUY	24	64
45673	30	CBP	24	64
45673	30	IUY	24	64
45673	30	CBP	24	64

High-Performance SAS Coding - Third Edition

View solution in original post

ShiroAmada · Posted 09-19-2017 09:21 PM

Try this....

data WANT;
  set HAVE;
group=catx('-',ID, BUSINESS, ACTSIZ);

grouplag=lag(group);

if (group ^= grouplag) or _n_=1 then do;
  totalsize=actsiz;
 is_this_what_you_want='Y';
end;

run;

Hope this helps.

ChrisNZ · Posted 09-19-2017 11:34 PM

Like this?

data HAVE;  
length ID $ 5 BUSINESS $ 2 LOB $ 3;
input ID $ BUSINESS LOB $ ACTSIZ;
datalines;
12340 01 RTY 23
12341 20 TYR 0
12341 30 1UY 8
12343 30 CBP 8
12343 30 MTS 8
12343 30 HPL 8
12343 45 HFI 6
12343 45 HFS 6
45673 01 RTY 40
45673 30 IUY 24
45673 30 CBP 24
45673 30 IUY 24
45673 30 CBP 24
run;
data _V/view=_V;
  set HAVE;
  by ID BUSINESS;
  VALUE=( first.BUSINESS * ACTSIZ);
run;
proc sql;
   select a.*, SUM
   from _V                                                   a
       ,(select ID, sum(VALUE) as SUM from _V group by 1)    b
   where a.ID=b.ID ;
run;

ID	BUSINESS	LOB	ACTSIZ	ACTSIZ
12340	01	RTY	23	23
12341	20	TYR	0	8
12341	30	1UY	8	8
12343	30	CBP	8	14
12343	30	MTS	8	14
12343	30	HPL	8	14
12343	45	HFI	6	14
12343	45	HFS	6	14
45673	01	RTY	40	64
45673	30	IUY	24	64
45673	30	CBP	24	64
45673	30	IUY	24	64
45673	30	CBP	24	64

High-Performance SAS Coding - Third Edition

Ksharp · Posted 09-20-2017 09:45 AM


data HAVE;  
length ID $ 5 BUSINESS $ 2 LOB $ 3;
input ID $ BUSINESS LOB $ ACTSIZ;
datalines;
12340 01 RTY 23
12341 20 TYR 0
12341 30 1UY 8
12343 30 CBP 8
12343 30 MTS 8
12343 30 HPL 8
12343 45 HFI 6
12343 45 HFS 6
45673 01 RTY 40
45673 30 IUY 24
45673 30 CBP 24
45673 30 IUY 24
45673 30 CBP 24
;
run;

proc sql;
select *
 from (select * from have ) as a left join
      (select id,sum(ACTSIZ) as sum_ACTSIZ from (select distinct id,ACTSIZ from have) group by id) as b
  on a.id=b.id;
quit;

distinct count values

Re: distinct count values

Re: distinct count values

Re: distinct count values

Re: distinct count values

Click image to register for webinar

Classroom Training Available!