BookmarkSubscribeRSS Feed
Paul_NYS
Obsidian | Level 7

Hi Everyone

I have a base data set that contains the info for a group of court cases--there is one record for each case. There is another data set I have that has dockets in it and I need one column in it for the cases in the base data set. The second data set has more than one record for each case--the multiple records are different dockets/dates for each case.

I would just like to grab the docket ID column from the second data set and add it to the first base data set populated with the most recent docket from those in the second docket data set--and keep the same original number of records in the first base data set. However, when I run the merge (using a "by" statement with a shared ID), the merge just creates multiple records in the first data set for each of multiple dockets in the second data set.

Any ideas how to set up a merge that will maintain the same records in the base data set and just add the most recent column (docket) from the second data set?

Also, here is the merge statement I am using:

data s3t1;

merge s3 (in=a) vgcleanD1 (keep=entity_id FileDate Docket);

if a;

by entity_id;

run;

Paul

4 REPLIES 4
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10

Pursue using the IN=<flag_indicator>  on the SET statement as an individual dataset option -- then test for those contributing instances that are for your BASE dataset -- suggest using SAS diagnostic code PUTLOG _ALL_;   which will show evidence of conditions you "want" - suggest reviewing the SAS DOC on DATA step processing with SET and the IN= dataset option.

Also,"most recent" implies you have a way to order your "second" dataset with DESCENDING on a SORT with BY?  Look at using EQUALS and NODUPKEY on a second sort step that has a BY omitting the 'date' variable -- that will remove duplicates other than the most recent. Here again, use PROC PRINT (intermediate) and/or PUTLOG _ALL_;   or PROC FREQ;  against your "ordered" and "de-dup'd" second dataset to prepare it for the final MERGE back with your BASE dataset using a DATA step with a MERGE (along with the IN= dataset parameter), a BY, and again using the IF <flag_indicator> to detect a BASE observation contributed condition for OUTPUT.

Scott Barry
SBBWorks, Inc.

Paul_NYS
Obsidian | Level 7

I tried the sort with nodupkey and it didn't get me what I was looking for.

Hold off the above question, I didn't phrase it correctly.

Paul

Kurt_Bremser
Super User

Sort vgcleanD1 by entity_id descending FileDate;

Then do:

data s3t1;

merge s3 (in=a) vgcleanD1 (keep=entity_id FileDate Docket);

if a;

by entity_id;

if first.entity_id;

run;

This of course assumes that there is only one observation per entity_id in s3; if there are more, you need to make the reduction of vgcleanD1 in an intermediate step after the sort:

data vgcleanD1_int;

set vgcleanD1;

by entity_id;

if first.entity_id;

run;

Then use vgcleanD1_int in the merge, and omit the first.entity_id condition.

Tom
Super User Tom
Super User

Since you are doing a 1 to Many merge all of the records from the "Many" will still be marked as also being in the "one" since they match on the BY variables.

You can add

IF FIRST.ENTITY_ID;

to prevent it from taking more than one record per entity.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 984 views
  • 0 likes
  • 4 in conversation