BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have two datasets that each have ID and a date variable. Dataset 1 also has Var1 (which is continous) and Dataset 2 also has Var2 (a dummy variable). 

 

Dataset 1

ID     Date     Var1

1      1/5/12     30

1       3/9/13    54

1      10/4/14    3

2      2/1/12      9

2      1/1/13     23

 

Dataset 2

ID     Date     Var2

1      1/5/12      0

1      2/7/12      0

1      3/9/13      1

2      2/1/12      1

2      4/3/12      1

3      4/5/13      1

3      10/4/13    1

 

The goal is to have for each ID/date combo as much information as possible--so ideally I'd want the values of both Var1 and Var2 on a given date, but if a person only has Var1 or Var2 on a given date, that should be displayed. Therefore, I joined them using proc sql full join on ID and date:

 

proc sql;

create table want as

select * from Dataset1 a

full join Dataset2 b

on (a.ID=b.ID) and (a.date=b.date);

quit;

 

This gave me the output dataset I wanted:

 

ID     Date     Var1      Var2

1      1/5/12     30          0

1      2/7/12                   0

1       3/9/13    54          1

1      10/4/14    3

2      2/1/12      9           1

2      4/3/12                   1

2      1/1/13     23

3      4/5/13                   1

3      10/4/13                 1

 

The issue is that there are some people like ID 3, who are only in Dataset 2 but not Dataset 1. Ideally, I'd like to exclude these people from my final output dataset altogether, but I'm not sure how to do that with a full outer join. Alternatively, if I just had a way of identifying which people came from Dataset 2 only, that would be helpful as well. 

 

Any help is much appreciated. 

1 REPLY 1
DavidPope
SAS Employee

I didn't find the proc sql code you included produced the results you stated. 

However, this version of the code produced what you stated:

 

proc sql;
create table want as
select coalesce(a.id, b.id) 'id',
coalesce(a.date, b.date) 'date',
a.var1, b.var2
from temp1 a
full join temp2 b
on (a.ID=b.ID) and (a.date=b.date);

quit;

 

AND if you add this where clause it eliminates the records from the 2nd data set that you stated you didn't want included.

 

proc sql;
create table want as
select coalesce(a.id, b.id) 'id',
coalesce(a.date, b.date) 'date',
a.var1, b.var2
from temp1 a
full join temp2 b
on (a.ID=b.ID) and (a.date=b.date)
where coalesce(a.id,b.id) in (select id from temp1);
quit;

 

Hope this helps,

David

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 935 views
  • 0 likes
  • 2 in conversation