Hello
I have a data set where for each customer there are multiple rows.
In data set there are 2 columns: ID,X .
Target: Create a new data set that will calculate for each customer number of distinct values of X
For example:
For customer 1 there are 2 distinct values (10,20)
For customer 2 there are 3 distinct values (10,15,30)
For customer 2 there is 1 distinct values (40)
I want that the new variable values will appear in all rows for each customer(I think it is called remerge in sas language).
Please find a solution .
My question is if there is another way with one step to do it
Data tbl1;
input ID x;
cards;
1 10
1 10
1 10
1 20
1 20
1 20
2 10
2 15
2 15
2 15
2 30
2 30
3 40
3 40
3 40
3 40
3 40
3 40
;
run;
PROC SQL;
create table tbl2 as
select ID, count(distinct X) as dis_x
from tbl1
group by ID
;
QUIT;
PROC SQL;
create table tbl3 as
select a.*,b.dis_x
from tbl1 as a
left join tbl2 as b
on a.ID=b.ID
;
QUIT;
You don't need to change your code much
proc sql;
create table tbl2 as
select *, count(distinct x) as dis_x
from tbl1
group by ID;
quit;
Result:
ID X dis_X 1 10 2 1 20 2 1 10 2 1 20 2 1 20 2 1 10 2 2 10 3 2 30 3 2 30 3 2 15 3 2 15 3 2 15 3 3 40 1 3 40 1 3 40 1 3 40 1 3 40 1 3 40 1
And if the objective of the question is to find alternatives to PROC SQL, here is a data step approach
data tbl2;
array _ {999999} _temporary_;
do until (last.id);
set tbl1;
by id;
_[x]=1;
end;
do until (last.id);
set tbl1;
by id;
dis_x=n(of _[*]);
output;
end;
call missing(of _[*]);
run;
Key-indexing rules ... when applicable, of course.
Failing the latter, can always do a hash table instead.
Kind regards
Paul D.
@hashman I agree. Just wanted to point out that there are many alternatived to the SQL.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.