Hello
I have a data set where for each customer there are multiple rows.
In data set there are 2 columns: ID,X .
Target: Create a new data set that will calculate for each customer number of distinct values of X
For example:
For customer 1 there are 2 distinct values (10,20)
For customer 2 there are 3 distinct values (10,15,30)
For customer 2 there is 1 distinct values (40)
I want that the new variable values will appear in all rows for each customer(I think it is called remerge in sas language).
Please find a solution .
My question is if there is another way with one step to do it
Data tbl1;
input ID x;
cards;
1 10
1 10
1 10
1 20
1 20
1 20
2 10
2 15
2 15
2 15
2 30
2 30
3 40
3 40
3 40
3 40
3 40
3 40
;
run;
PROC SQL;
create table tbl2 as
select ID, count(distinct X) as dis_x
from tbl1
group by ID
;
QUIT;
PROC SQL;
create table tbl3 as
select a.*,b.dis_x
from tbl1 as a
left join tbl2 as b
on a.ID=b.ID
;
QUIT;
You don't need to change your code much
proc sql;
create table tbl2 as
select *, count(distinct x) as dis_x
from tbl1
group by ID;
quit;
Result:
ID X dis_X 1 10 2 1 20 2 1 10 2 1 20 2 1 20 2 1 10 2 2 10 3 2 30 3 2 30 3 2 15 3 2 15 3 2 15 3 3 40 1 3 40 1 3 40 1 3 40 1 3 40 1 3 40 1
And if the objective of the question is to find alternatives to PROC SQL, here is a data step approach
data tbl2;
array _ {999999} _temporary_;
do until (last.id);
set tbl1;
by id;
_[x]=1;
end;
do until (last.id);
set tbl1;
by id;
dis_x=n(of _[*]);
output;
end;
call missing(of _[*]);
run;
Key-indexing rules ... when applicable, of course.
Failing the latter, can always do a hash table instead.
Kind regards
Paul D.
@hashman I agree. Just wanted to point out that there are many alternatived to the SQL.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.