SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Merging two datasets by ID and date

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Merging two datasets by ID and date

Hello,

 

I am having a little trouble merging two datasets together. 

Dataset A:

ID Date x1 x2 x3

 

Dataset B:

ID Date y1

 

Both datasets are one record per date, but A will have 5 dates per ID, whereas B will have anywhere from 0 to 6 dates per ID. Not every ID in A will have records in B.

 

When I first tried this, it was merging correctly for the first occurance per id, but afterwards would just add the records from B as if I were interleaving the datasets. I used the following code:

 

data AB; merge A B;

by ID date;

run;

 

The resulting dataset looks like this:

 

ID Date x1 x2 x3 y1

1   1      1   0   1

1   2      0   0   1   1

1   3      0   1   1   

1   4      1   1   0

1   5      0   1   1

1   5                     2

 

So set A has all five dates and set B has records at dates 2 and 5. I would like it so that the two records with ID 1 and date 5 were the same record.

Any help would be appreciated!

 

Thanks!


Accepted Solutions
Solution
‎12-13-2016 11:37 AM
Respected Advisor
Posts: 4,644

Re: Merging two datasets by ID and date

I suspect that your dates don't really match all the time. Try rounding the dates before joining :

 

proc sql;
create table AB as
select a.*, b.y
from A as a left join B as b
on a.ID=b.ID and round(a.date)=round(b.date);
quit;
PG

View solution in original post


All Replies
Super User
Posts: 17,784

Re: Merging two datasets by ID and date

Many to many merges aren't what you expect in a data step. Using SQL in this case is recommended.

You show what you're getting, but not what you expect ...what do you expect as results?

http://support.sas.com/kb/24/752.html

Occasional Contributor
Posts: 9

Re: Merging two datasets by ID and date

I am expecting to get this:

 

ID Date x1 x2 x3 y1

1   1      1   0   1

1   2      0   0   1   1

1   3      0   1   1   

1   4      1   1   0

1   5      0   1   1   2

 

There should only be one record per date, but the data step results in there being an extra record for the dates that are also in B.

 

I have also tried in SQL, but have gotten similar results.

 

proc sql;
create table AB as
select *
from A as a left join B as b
on a.ID=b.cID and a.date=b.date;
quit;

Solution
‎12-13-2016 11:37 AM
Respected Advisor
Posts: 4,644

Re: Merging two datasets by ID and date

I suspect that your dates don't really match all the time. Try rounding the dates before joining :

 

proc sql;
create table AB as
select a.*, b.y
from A as a left join B as b
on a.ID=b.ID and round(a.date)=round(b.date);
quit;
PG
Occasional Contributor
Posts: 9

Re: Merging two datasets by ID and date

Thank you for calling my attention to this. Turns out this was the problem.

 

Thank you!

Community Manager
Posts: 486

Re: Merging two datasets by ID and date

I'm glad you were able to identify the problem, jwestra84! If one of the replies was the exact solution to your problem, can you "Accept it as a solution"? Or if one was particularly helpful, feel free to "Like" it. This will help other community members who may run into the same issue know what worked.

Thanks!
Anna

Super User
Posts: 5,080

Re: Merging two datasets by ID and date

You will need to check your data.  The results you describe do not match the program you showed.  The program would perform exactly as you hoped and expected.  These results correspond to data set B having 2 observations for DATE=5.  On the first of those, Y1 is missing and on the second, Y1 is 2.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 334 views
  • 0 likes
  • 5 in conversation