Hi
I am trying to merge two SAS datasets with the below data step. However, I would like to keep all the records of the issueJoined data set and not just only those that have a matching record in the issuejoinedevents data set. Is this possible?
Right, I am not getting any records in the resulting data set. However, when I run a proc SQL server and join the same two data sets through the same column, docket_id, I am getting records.
Paul
data s10;
merge work.issueJoined work.issuejoinedevents (keep=edate rename=(edate=issueJoinDate));
by docket_id;
run;
One way using proc sort would be to use two sorts. i.e.,
proc sort data=work.issuejoinedevents;
by docket_id edate;
run;
which would order the file by edates within docket_id, following which you could keep only the earliest date with another sort:
proc sort data=work.issuejoinedevents nodupkey;
by docket_id;
run;
Hi,
I added the RED parts. Also both datasets have to be sorted by docket_id.
data s10;
merge work.issueJoined(in=a) work.issuejoinedevents (keep=docked_id edate rename=(edate=issueJoinDate));
by docket_id;
if a;
run;
H Linlin
I did as you indicated above and revised the code to reflect the below. It seemed to generally work, however, in the original issueJoined, there are 114,744 records and in the resulting s10 data set, there are 115,608 records. I have not completely looked yet, but I am assuming some records from the issuejoinedevents data set were included that do not have a matching record in the issueJoined data set. Is this possible?
Paul
data s10;
merge work.issueJoined(in=a) work.issuejoinedevents (keep=docket_id edate rename=(edate=issueJoinDate));
by docket_id;
if a;
run;
Paul,
No, that's not possible using this code. What is happening is that some of the docket_id values in IssueJoinedEvents appear multiple times within that data set. As a result, they appear multiple times in the merged data set as well.
That was the second option, thanks. I am sorting both data sets prior to merging. Both sorts are by docket_id. Can I add a second sort by a date column (edate) and then just keep the first instance of edate (where it is lowest)? In the below code, do you know how would I only keep the first (lowest) value of edate?
proc sort data=work.issuejoinedevents;
by docket_id edate;
run;
One way using proc sort would be to use two sorts. i.e.,
proc sort data=work.issuejoinedevents;
by docket_id edate;
run;
which would order the file by edates within docket_id, following which you could keep only the earliest date with another sort:
proc sort data=work.issuejoinedevents nodupkey;
by docket_id;
run;
That seemed to have worked. I found the nodupkey, but adding the second sort just using the docket_id made it work completely. Thanks to everyone.
Paul
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.