BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have two datasets that each have ID and a date variable. Dataset 1 also has Var1 (which is continous) and Dataset 2 also has Var2 (a dummy variable). 

 

Dataset 1

ID     Date     Var1

1      1/5/12     30

1       3/9/13    54

1      10/4/14    3

2      2/1/12      9

2      1/1/13     23

 

Dataset 2

ID     Date     Var2

1      1/5/12      0

1      2/7/12      0

1      3/9/13      1

2      2/1/12      1

2      4/3/12      1

3      4/5/13      1

3      10/4/13    1

 

The goal is to have for each ID/date combo as much information as possible--so ideally I'd want the values of both Var1 and Var2 on a given date, but if a person only has Var1 or Var2 on a given date, that should be displayed. Therefore, I joined them using proc sql full join on ID and date:

 

proc sql;

create table want as

select * from Dataset1 a

full join Dataset2 b

on (a.ID=b.ID) and (a.date=b.date);

quit;

 

This gave me the output dataset I wanted:

 

ID     Date     Var1      Var2

1      1/5/12     30          0

1      2/7/12                   0

1       3/9/13    54          1

1      10/4/14    3

2      2/1/12      9           1

2      4/3/12                   1

2      1/1/13     23

3      4/5/13                   1

3      10/4/13                 1

 

The issue is that there are some people like ID 3, who are only in Dataset 2 but not Dataset 1. Ideally, I'd like to exclude these people from my final output dataset altogether, but I'm not sure how to do that with a full outer join. Alternatively, if I just had a way of identifying which people came from Dataset 2 only, that would be helpful as well. 

 

Any help is much appreciated. 

1 REPLY 1
DavidPope
SAS Employee

I didn't find the proc sql code you included produced the results you stated. 

However, this version of the code produced what you stated:

 

proc sql;
create table want as
select coalesce(a.id, b.id) 'id',
coalesce(a.date, b.date) 'date',
a.var1, b.var2
from temp1 a
full join temp2 b
on (a.ID=b.ID) and (a.date=b.date);

quit;

 

AND if you add this where clause it eliminates the records from the 2nd data set that you stated you didn't want included.

 

proc sql;
create table want as
select coalesce(a.id, b.id) 'id',
coalesce(a.date, b.date) 'date',
a.var1, b.var2
from temp1 a
full join temp2 b
on (a.ID=b.ID) and (a.date=b.date)
where coalesce(a.id,b.id) in (select id from temp1);
quit;

 

Hope this helps,

David

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 787 views
  • 0 likes
  • 2 in conversation