Hi,
I have two datasets that each have ID and a date variable. Dataset 1 also has Var1 (which is continous) and Dataset 2 also has Var2 (a dummy variable).
Dataset 1
ID Date Var1
1 1/5/12 30
1 3/9/13 54
1 10/4/14 3
2 2/1/12 9
2 1/1/13 23
Dataset 2
ID Date Var2
1 1/5/12 0
1 2/7/12 0
1 3/9/13 1
2 2/1/12 1
2 4/3/12 1
3 4/5/13 1
3 10/4/13 1
The goal is to have for each ID/date combo as much information as possible--so ideally I'd want the values of both Var1 and Var2 on a given date, but if a person only has Var1 or Var2 on a given date, that should be displayed. Therefore, I joined them using proc sql full join on ID and date:
proc sql;
create table want as
select * from Dataset1 a
full join Dataset2 b
on (a.ID=b.ID) and (a.date=b.date);
quit;
This gave me the output dataset I wanted:
ID Date Var1 Var2
1 1/5/12 30 0
1 2/7/12 0
1 3/9/13 54 1
1 10/4/14 3
2 2/1/12 9 1
2 4/3/12 1
2 1/1/13 23
3 4/5/13 1
3 10/4/13 1
The issue is that there are some people like ID 3, who are only in Dataset 2 but not Dataset 1. Ideally, I'd like to exclude these people from my final output dataset altogether, but I'm not sure how to do that with a full outer join. Alternatively, if I just had a way of identifying which people came from Dataset 2 only, that would be helpful as well.
Any help is much appreciated.
I didn't find the proc sql code you included produced the results you stated.
However, this version of the code produced what you stated:
proc sql;
create table want as
select coalesce(a.id, b.id) 'id',
coalesce(a.date, b.date) 'date',
a.var1, b.var2
from temp1 a
full join temp2 b
on (a.ID=b.ID) and (a.date=b.date);
quit;
AND if you add this where clause it eliminates the records from the 2nd data set that you stated you didn't want included.
proc sql;
create table want as
select coalesce(a.id, b.id) 'id',
coalesce(a.date, b.date) 'date',
a.var1, b.var2
from temp1 a
full join temp2 b
on (a.ID=b.ID) and (a.date=b.date)
where coalesce(a.id,b.id) in (select id from temp1);
quit;
Hope this helps,
David
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.