BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have two datasets that each have ID and a date variable. Dataset 1 also has Var1 (which is continous) and Dataset 2 also has Var2 (a dummy variable). 

 

Dataset 1

ID     Date     Var1

1      1/5/12     30

1       3/9/13    54

1      10/4/14    3

2      2/1/12      9

2      1/1/13     23

 

Dataset 2

ID     Date     Var2

1      1/5/12      0

1      2/7/12      0

1      3/9/13      1

2      2/1/12      1

2      4/3/12      1

3      4/5/13      1

3      10/4/13    1

 

The goal is to have for each ID/date combo as much information as possible--so ideally I'd want the values of both Var1 and Var2 on a given date, but if a person only has Var1 or Var2 on a given date, that should be displayed. Therefore, I joined them using proc sql full join on ID and date:

 

proc sql;

create table want as

select * from Dataset1 a

full join Dataset2 b

on (a.ID=b.ID) and (a.date=b.date);

quit;

 

This gave me the output dataset I wanted:

 

ID     Date     Var1      Var2

1      1/5/12     30          0

1      2/7/12                   0

1       3/9/13    54          1

1      10/4/14    3

2      2/1/12      9           1

2      4/3/12                   1

2      1/1/13     23

3      4/5/13                   1

3      10/4/13                 1

 

The issue is that there are some people like ID 3, who are only in Dataset 2 but not Dataset 1. Ideally, I'd like to exclude these people from my final output dataset altogether, but I'm not sure how to do that with a full outer join. Alternatively, if I just had a way of identifying which people came from Dataset 2 only, that would be helpful as well. 

 

Any help is much appreciated. 

1 REPLY 1
DavidPope
SAS Employee

I didn't find the proc sql code you included produced the results you stated. 

However, this version of the code produced what you stated:

 

proc sql;
create table want as
select coalesce(a.id, b.id) 'id',
coalesce(a.date, b.date) 'date',
a.var1, b.var2
from temp1 a
full join temp2 b
on (a.ID=b.ID) and (a.date=b.date);

quit;

 

AND if you add this where clause it eliminates the records from the 2nd data set that you stated you didn't want included.

 

proc sql;
create table want as
select coalesce(a.id, b.id) 'id',
coalesce(a.date, b.date) 'date',
a.var1, b.var2
from temp1 a
full join temp2 b
on (a.ID=b.ID) and (a.date=b.date)
where coalesce(a.id,b.id) in (select id from temp1);
quit;

 

Hope this helps,

David

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 680 views
  • 0 likes
  • 2 in conversation