BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have two datasets that each have ID and a date variable. Dataset 1 also has Var1 (which is continous) and Dataset 2 also has Var2 (a dummy variable). 

 

Dataset 1

ID     Date     Var1

1      1/5/12     30

1       3/9/13    54

1      10/4/14    3

2      2/1/12      9

2      1/1/13     23

 

Dataset 2

ID     Date     Var2

1      1/5/12      0

1      2/7/12      0

1      3/9/13      1

2      2/1/12      1

2      4/3/12      1

3      4/5/13      1

3      10/4/13    1

 

The goal is to have for each ID/date combo as much information as possible--so ideally I'd want the values of both Var1 and Var2 on a given date, but if a person only has Var1 or Var2 on a given date, that should be displayed. Therefore, I joined them using proc sql full join on ID and date:

 

proc sql;

create table want as

select * from Dataset1 a

full join Dataset2 b

on (a.ID=b.ID) and (a.date=b.date);

quit;

 

This gave me the output dataset I wanted:

 

ID     Date     Var1      Var2

1      1/5/12     30          0

1      2/7/12                   0

1       3/9/13    54          1

1      10/4/14    3

2      2/1/12      9           1

2      4/3/12                   1

2      1/1/13     23

3      4/5/13                   1

3      10/4/13                 1

 

The issue is that there are some people like ID 3, who are only in Dataset 2 but not Dataset 1. Ideally, I'd like to exclude these people from my final output dataset altogether, but I'm not sure how to do that with a full outer join. Alternatively, if I just had a way of identifying which people came from Dataset 2 only, that would be helpful as well. 

 

Any help is much appreciated. 

1 REPLY 1
DavidPope
SAS Employee

I didn't find the proc sql code you included produced the results you stated. 

However, this version of the code produced what you stated:

 

proc sql;
create table want as
select coalesce(a.id, b.id) 'id',
coalesce(a.date, b.date) 'date',
a.var1, b.var2
from temp1 a
full join temp2 b
on (a.ID=b.ID) and (a.date=b.date);

quit;

 

AND if you add this where clause it eliminates the records from the 2nd data set that you stated you didn't want included.

 

proc sql;
create table want as
select coalesce(a.id, b.id) 'id',
coalesce(a.date, b.date) 'date',
a.var1, b.var2
from temp1 a
full join temp2 b
on (a.ID=b.ID) and (a.date=b.date)
where coalesce(a.id,b.id) in (select id from temp1);
quit;

 

Hope this helps,

David

 

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 613 views
  • 0 likes
  • 2 in conversation