DATA Step, Macro, Functions and more

Merging data sets and keeping one set

Accepted Solution Solved
Reply
Regular Contributor
Posts: 216
Accepted Solution

Merging data sets and keeping one set

Hi

I am trying to merge two SAS datasets with the below data step. However, I would like to keep all the records of the issueJoined data set and not just only those that have a matching record in the issuejoinedevents data set. Is this possible?

Right, I am not getting any records in the resulting data set. However, when I run a proc SQL server and join the same two data sets through the same column, docket_id, I am getting records.

Paul

data s10;

merge work.issueJoined work.issuejoinedevents (keep=edate rename=(edate=issueJoinDate));

by docket_id;

run;


Accepted Solutions
Solution
‎10-23-2012 05:56 PM
PROC Star
Posts: 7,363

Re: Merging data sets and keeping one set

One way using proc sort would be to use two sorts.  i.e.,

proc sort data=work.issuejoinedevents;

  by docket_id edate;

run;

which would order the file by edates within docket_id, following which you could keep only the earliest date with another sort:

proc sort data=work.issuejoinedevents nodupkey;

  by docket_id;

run;

View solution in original post


All Replies
Super Contributor
Posts: 1,636

Re: Merging data sets and keeping one set

Hi,

I added the RED parts. Also both datasets have to be sorted by docket_id.

data s10;

merge work.issueJoined(in=a)  work.issuejoinedevents (keep=docked_id edate rename=(edate=issueJoinDate));

by docket_id;

if a;

run;

Regular Contributor
Posts: 216

Re: Merging data sets and keeping one set

H Linlin

I did as you indicated above and revised the code to reflect the below. It seemed to generally work, however, in the original issueJoined, there are 114,744 records and in the resulting s10 data set, there are 115,608 records. I have not completely looked yet, but I am assuming some records from the issuejoinedevents data set were included that do not have a matching record in the issueJoined data set. Is this possible?

Paul

data s10;

merge work.issueJoined(in=a) work.issuejoinedevents (keep=docket_id edate rename=(edate=issueJoinDate));

by docket_id;

if a;

run;

Super User
Posts: 5,082

Re: Merging data sets and keeping one set

Paul,

No, that's not possible using this code.  What is happening is that some of the docket_id values in IssueJoinedEvents appear multiple times within that data set.  As a result, they appear multiple times in the merged data set as well.


Regular Contributor
Posts: 216

Re: Merging data sets and keeping one set

That was the second option, thanks. I am sorting both data sets prior to merging. Both sorts are by docket_id. Can I add a second sort by a date column (edate) and then just keep the first instance of edate (where it is lowest)? In the below code, do you know how would I only keep the first (lowest) value of edate?

proc sort data=work.issuejoinedevents;

by docket_id edate;

run;

Solution
‎10-23-2012 05:56 PM
PROC Star
Posts: 7,363

Re: Merging data sets and keeping one set

One way using proc sort would be to use two sorts.  i.e.,

proc sort data=work.issuejoinedevents;

  by docket_id edate;

run;

which would order the file by edates within docket_id, following which you could keep only the earliest date with another sort:

proc sort data=work.issuejoinedevents nodupkey;

  by docket_id;

run;

Regular Contributor
Posts: 216

Re: Merging data sets and keeping one set

That seemed to have worked. I found the nodupkey, but adding the second sort just using the docket_id made it work completely. Thanks to everyone.

Paul

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 327 views
  • 3 likes
  • 4 in conversation