If the ID from data set 1 exists in dataset 2, I would like to create a new variable called flag in dataset 2 and it =1 and =0 otherwise. This is shown below. I tried using merge, but couldn't find a good way to do this. any help would be great.
Data 1:
ID | X1 | X2 | |
54 | 9 | 3 | |
55 | 3 | 2 | |
56 | 2 | 0 | |
57 | 4 | 4 | |
58 | 4 | 1 | |
59 | 6 | 8 | |
60 | 1 | 7 |
Data 2:
ID | Y1 | Y2 | Y3 |
54 | 55 | 79 | 55 |
55 | 83 | 61 | 55 |
56 | 74 | 77 | 60 |
57 | 64 | 88 | 56 |
60 | 87 | 78 | 83 |
61 | 84 | 87 | 77 |
62 | 78 | 68 | 50 |
63 | 89 | 59 | 52 |
64 | 74 | 77 | 60 |
65 | 55 | 79 | 55 |
Desired data:
ID | Y1 | Y2 | Y3 | FLAG |
54 | 55 | 79 | 55 | 1 |
55 | 83 | 61 | 55 | 1 |
56 | 74 | 77 | 60 | 1 |
57 | 64 | 88 | 56 | 1 |
60 | 87 | 78 | 83 | 1 |
61 | 84 | 87 | 77 | 0 |
62 | 78 | 68 | 50 | 0 |
63 | 89 | 59 | 52 | 0 |
64 | 74 | 77 | 60 | 0 |
65 | 55 | 79 | 55 | 0 |
The easy way is to use SQL:
proc sql;
create table want as
select *,id in(select id from data1) as flag
from data2;
quit;
The easy way is to use SQL:
proc sql;
create table want as
select *,id in(select id from data1) as flag
from data2;
quit;
And if performance is of importance, use a data step hash:
data want;
set data2;
if _n_ = 1
then do;
declare hash d1 (dataset:"data1");
d1.definekey("id");
d1.definedone();
end;
flag = (d1.check() = 0);
run;
One more way, but definitely slower:
data want;
merge one(in=a) two(in=b);
by id;
if a and b then flag=1;
else flag=0;
if y1=. or y2=. or y3=. then delete;
keep id y1 y2 y3 flag;
run;
How you handle the missings after the merge is dependent on your ultimate objective. Here I deleted any record missing at least one of y1-y3. I could understand making this different if the criteria for keeping a record was something other than at least one of the variables in dataset 2 is missing.
SteveDenham
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.