DATA Step, Macro, Functions and more

distinct count values

Accepted Solution Solved
Reply
Contributor
Posts: 33
Accepted Solution

distinct count values

Hi ,

 

I have a data set that has the ID, Business,LOB,ACTSIZ. like below example

 

data DATA;  

length id $ 5 Business $ 2 LOB $ 3;

input id $ Business LOB $ ACTSIZ;

datalines;

12340 01 RTY 23

12341 20 TYR 0

12341 30 1UY 8

12343 30 CBP 8

12343 30 MTS 8

12343 30 HPL 8

12343 45 HFI 6

12343 45 HFS 6

45673 01 RTY 40

45673 30 IUY 24

45673 30 CBP 24

45673 30 IUY 24

45673 30 CBP 24

;

run;

 

I was summing ACTSIZ column group by id to get the totalsize and output looks like below ,which is wrong (seprate each id by line for easy readability)::

id Business LOB ACTSIZ Totalsize

12340 01 RTY 23 23

12341 20 TYR 0 8

12341 30 1UY 8 8

12343 30 CBP 8 36

12343 30 MTS 8 36

12343 30 HPL 8 36

12343 45 HFI 6 36

12343 45 HFS 6 36

45673 01 RTY 40 136

45673 30 IUY 24 136

45673 30 CBP 24 136

45673 30 IUY 24 136

45673 30 CBP 24 136

 

If the business id and actual size repeat more than once (LOB will be different) then count the actual size one time toward total size. Business want the output like below (seprate each id by line for easy readability):

id Business LOB ACTSIZ Totalsize 

12340 01 RTY 23 23

12341 20 TYR 0 8

12341 30 1UY 8 8

12343 30 CBP 8 14

12343 30 MTS 8 14

12343 30 HPL 8 14

12343 45 HFI 6 14

12343 45 HFS 6 14

45673 01 RTY 40 64

45673 30 IUY 24 64

45673 30 CBP 24 64

45673 30 IUY 24 64

45673 30 CBP 24 64

 

Note: actual dataset has 150 columns and we need all those column in the output, so we can't use nodupkey option.

 

Thank you so much for your help.

 

-Sudha.


Accepted Solutions
Solution
‎09-20-2017 09:26 AM
PROC Star
Posts: 1,760

Re: distinct count values

[ Edited ]

Like this?

 

data HAVE;  
length ID $ 5 BUSINESS $ 2 LOB $ 3;
input ID $ BUSINESS LOB $ ACTSIZ;
datalines;
12340 01 RTY 23
12341 20 TYR 0
12341 30 1UY 8
12343 30 CBP 8
12343 30 MTS 8
12343 30 HPL 8
12343 45 HFI 6
12343 45 HFS 6
45673 01 RTY 40
45673 30 IUY 24
45673 30 CBP 24
45673 30 IUY 24
45673 30 CBP 24
run;
data _V/view=_V;
  set HAVE;
  by ID BUSINESS;
  VALUE=( first.BUSINESS * ACTSIZ);
run;
proc sql;
   select a.*, SUM
   from _V                                                   a
       ,(select ID, sum(VALUE) as SUM from _V group by 1)    b
   where a.ID=b.ID ;
run;

 

ID BUSINESS LOB ACTSIZ ACTSIZ
12340 01 RTY 23 23
12341 20 TYR 0 8
12341 30 1UY 8 8
12343 30 CBP 8 14
12343 30 MTS 8 14
12343 30 HPL 8 14
12343 45 HFI 6 14
12343 45 HFS 6 14
45673 01 RTY 40 64
45673 30 IUY 24 64
45673 30 CBP 24 64
45673 30 IUY 24 64
45673 30 CBP 24 64

View solution in original post


All Replies
Frequent Contributor
Posts: 100

Re: distinct count values

Try this....

 

data WANT;
  set HAVE;
group=catx('-',ID, BUSINESS, ACTSIZ);

grouplag=lag(group);

if (group ^= grouplag) or _n_=1 then do;
  totalsize=actsiz;
 is_this_what_you_want='Y';
end;

run;

Hope this helps.

Solution
‎09-20-2017 09:26 AM
PROC Star
Posts: 1,760

Re: distinct count values

[ Edited ]

Like this?

 

data HAVE;  
length ID $ 5 BUSINESS $ 2 LOB $ 3;
input ID $ BUSINESS LOB $ ACTSIZ;
datalines;
12340 01 RTY 23
12341 20 TYR 0
12341 30 1UY 8
12343 30 CBP 8
12343 30 MTS 8
12343 30 HPL 8
12343 45 HFI 6
12343 45 HFS 6
45673 01 RTY 40
45673 30 IUY 24
45673 30 CBP 24
45673 30 IUY 24
45673 30 CBP 24
run;
data _V/view=_V;
  set HAVE;
  by ID BUSINESS;
  VALUE=( first.BUSINESS * ACTSIZ);
run;
proc sql;
   select a.*, SUM
   from _V                                                   a
       ,(select ID, sum(VALUE) as SUM from _V group by 1)    b
   where a.ID=b.ID ;
run;

 

ID BUSINESS LOB ACTSIZ ACTSIZ
12340 01 RTY 23 23
12341 20 TYR 0 8
12341 30 1UY 8 8
12343 30 CBP 8 14
12343 30 MTS 8 14
12343 30 HPL 8 14
12343 45 HFI 6 14
12343 45 HFS 6 14
45673 01 RTY 40 64
45673 30 IUY 24 64
45673 30 CBP 24 64
45673 30 IUY 24 64
45673 30 CBP 24 64
Super User
Posts: 10,046

Re: distinct count values


data HAVE;  
length ID $ 5 BUSINESS $ 2 LOB $ 3;
input ID $ BUSINESS LOB $ ACTSIZ;
datalines;
12340 01 RTY 23
12341 20 TYR 0
12341 30 1UY 8
12343 30 CBP 8
12343 30 MTS 8
12343 30 HPL 8
12343 45 HFI 6
12343 45 HFS 6
45673 01 RTY 40
45673 30 IUY 24
45673 30 CBP 24
45673 30 IUY 24
45673 30 CBP 24
;
run;

proc sql;
select *
 from (select * from have ) as a left join
      (select id,sum(ACTSIZ) as sum_ACTSIZ from (select distinct id,ACTSIZ from have) group by id) as b
  on a.id=b.id;
quit;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 134 views
  • 0 likes
  • 4 in conversation