BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Paul_NYS
Obsidian | Level 7

Hi

I am trying to merge two SAS datasets with the below data step. However, I would like to keep all the records of the issueJoined data set and not just only those that have a matching record in the issuejoinedevents data set. Is this possible?

Right, I am not getting any records in the resulting data set. However, when I run a proc SQL server and join the same two data sets through the same column, docket_id, I am getting records.

Paul

data s10;

merge work.issueJoined work.issuejoinedevents (keep=edate rename=(edate=issueJoinDate));

by docket_id;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

One way using proc sort would be to use two sorts.  i.e.,

proc sort data=work.issuejoinedevents;

  by docket_id edate;

run;

which would order the file by edates within docket_id, following which you could keep only the earliest date with another sort:

proc sort data=work.issuejoinedevents nodupkey;

  by docket_id;

run;

View solution in original post

6 REPLIES 6
Linlin
Lapis Lazuli | Level 10

Hi,

I added the RED parts. Also both datasets have to be sorted by docket_id.

data s10;

merge work.issueJoined(in=a)  work.issuejoinedevents (keep=docked_id edate rename=(edate=issueJoinDate));

by docket_id;

if a;

run;

Paul_NYS
Obsidian | Level 7

H Linlin

I did as you indicated above and revised the code to reflect the below. It seemed to generally work, however, in the original issueJoined, there are 114,744 records and in the resulting s10 data set, there are 115,608 records. I have not completely looked yet, but I am assuming some records from the issuejoinedevents data set were included that do not have a matching record in the issueJoined data set. Is this possible?

Paul

data s10;

merge work.issueJoined(in=a) work.issuejoinedevents (keep=docket_id edate rename=(edate=issueJoinDate));

by docket_id;

if a;

run;

Astounding
PROC Star

Paul,

No, that's not possible using this code.  What is happening is that some of the docket_id values in IssueJoinedEvents appear multiple times within that data set.  As a result, they appear multiple times in the merged data set as well.


Paul_NYS
Obsidian | Level 7

That was the second option, thanks. I am sorting both data sets prior to merging. Both sorts are by docket_id. Can I add a second sort by a date column (edate) and then just keep the first instance of edate (where it is lowest)? In the below code, do you know how would I only keep the first (lowest) value of edate?

proc sort data=work.issuejoinedevents;

by docket_id edate;

run;

art297
Opal | Level 21

One way using proc sort would be to use two sorts.  i.e.,

proc sort data=work.issuejoinedevents;

  by docket_id edate;

run;

which would order the file by edates within docket_id, following which you could keep only the earliest date with another sort:

proc sort data=work.issuejoinedevents nodupkey;

  by docket_id;

run;

Paul_NYS
Obsidian | Level 7

That seemed to have worked. I found the nodupkey, but adding the second sort just using the docket_id made it work completely. Thanks to everyone.

Paul

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1020 views
  • 3 likes
  • 4 in conversation