calculate number of distinct values between rows for each customer

Ronein · Posted 11-06-2019 02:24 AM

Hello

I have a data set where for each customer there are multiple rows.

In data set there are 2 columns: ID,X .

Target: Create a new data set that will calculate for each customer number of distinct values of X

For example:

For customer 1 there are 2 distinct values (10,20)

For customer 2 there are 3 distinct values (10,15,30)

For customer 2 there is 1 distinct values (40)

I want that the new variable values will appear in all rows for each customer(I think it is called remerge in sas language).

Please find a solution .

My question is if there is another way with one step to do it

Data tbl1;
input ID x;
cards;
1 10
1 10
1 10
1 20
1 20
1 20
2 10
2 15
2 15
2 15
2 30
2 30
3 40
3 40
3 40
3 40
3 40
3 40
;
run;

PROC SQL;
	create table tbl2  as
	select ID, count(distinct X) as dis_x 	   
	from  tbl1
	group by ID
;
QUIT;

PROC SQL;
	create table tbl3  as
	select a.*,b.dis_x 	   
	from  tbl1 as a
	left join tbl2 as b
	on a.ID=b.ID
;
QUIT;

PeterClemmensen · Posted 11-06-2019 02:27 AM

You don't need to change your code much

proc sql;
    create table tbl2 as
    select *, count(distinct x) as dis_x
    from tbl1
    group by ID;
quit;

Result:

ID X   dis_X
1  10  2
1  20  2
1  10  2
1  20  2
1  20  2
1  10  2
2  10  3
2  30  3
2  30  3
2  15  3
2  15  3
2  15  3
3  40  1
3  40  1
3  40  1
3  40  1
3  40  1
3  40  1

The DATA to DATA Step Macro
Blog: SASnrd

PeterClemmensen · Posted 11-06-2019 02:34 AM

And if the objective of the question is to find alternatives to PROC SQL, here is a data step approach

data tbl2;
    array _ {999999} _temporary_;
    do until (last.id);
        set tbl1;
        by id;
        _[x]=1;
    end;
    do until (last.id);
        set tbl1;
        by id;
        dis_x=n(of _[*]);
        output;
    end;
    call missing(of _[*]);
run;

The DATA to DATA Step Macro
Blog: SASnrd