DATA Step, Macro, Functions and more

NODUPKEY with PROC SORT

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

NODUPKEY with PROC SORT

Hi, i have the table PROC SORT by ID then Descending Date.

 

However ID 033 has 2 instances with the same date and NODUPKEY only picks up the 1st instance.

 

Is it possible to pick up ID 033's 2 instances?

My idea is to pick up each ID's latest date.

 

Thank you.

 

IDDATECODE
00912319503
00910159501
00909099507
03310109507
03310109503
03309099502
03305189401
03301059407
04512219506
04510319505
04510219505

Accepted Solutions
Solution
‎08-14-2016 08:55 AM
Super Contributor
Posts: 251

Re: NODUPKEY with PROC SORT

There are so many ways to skin this particular cat (sorry, Lily), but nodupkey isn't one of them.

 

It could have been done in one pass with SQL, but here's a method with proc sort and two data steps:

 

data nodupkey_data;
infile cards dsd dlm='09'x;
attrib id length=3 format=z3.;
attrib date informat=mmddyy6. format=yymmdd10.;
attrib code length=$ 2;
input id
date
code;
cards;
009 123195 03
009 101595 01
009 090995 07
033 101095 07
033 101095 03
033 090995 02
033 051894 01
033 010594 07
045 122195 06
045 103195 05
045 102195 05
;
run;

proc sort data=nodupkey_data;
by id date;
run;

/*
For each id, get the most recent date */
data maxdate;
set nodupkey_data;
by id;
if last.id;
keep id date;
run;

/*
Merge back in with the main dataset, but only retain those observations
where the date is the same as the most recent.
*/
data id_latest_date;
merge nodupkey_data
maxdate(in=in_maxdate);
by id date;
if in_maxdate;
run;

 

 

View solution in original post


All Replies
Solution
‎08-14-2016 08:55 AM
Super Contributor
Posts: 251

Re: NODUPKEY with PROC SORT

There are so many ways to skin this particular cat (sorry, Lily), but nodupkey isn't one of them.

 

It could have been done in one pass with SQL, but here's a method with proc sort and two data steps:

 

data nodupkey_data;
infile cards dsd dlm='09'x;
attrib id length=3 format=z3.;
attrib date informat=mmddyy6. format=yymmdd10.;
attrib code length=$ 2;
input id
date
code;
cards;
009 123195 03
009 101595 01
009 090995 07
033 101095 07
033 101095 03
033 090995 02
033 051894 01
033 010594 07
045 122195 06
045 103195 05
045 102195 05
;
run;

proc sort data=nodupkey_data;
by id date;
run;

/*
For each id, get the most recent date */
data maxdate;
set nodupkey_data;
by id;
if last.id;
keep id date;
run;

/*
Merge back in with the main dataset, but only retain those observations
where the date is the same as the most recent.
*/
data id_latest_date;
merge nodupkey_data
maxdate(in=in_maxdate);
by id date;
if in_maxdate;
run;

 

 

Occasional Contributor
Posts: 6

Re: NODUPKEY with PROC SORT

Thank you. Appreciated.

 

I had thought of NODUPKEY as a quick solution as I assumed there is no 2 or more IDs with the same date.

Super Contributor
Posts: 251

Re: NODUPKEY with PROC SORT

It comes down to understanding what nodupkey does (and, as I just found out, nouniquekey!).

 

The former detects multiple instances of duplicate keys and only keeps one of them. You have to be careful using this if you want to control which one is kept - if you do care, I recommend not using it; instead do a sort and then control which one in a data step.

 

The latter only keeps duplicates. Running a sort with your data keeps the two observations you want for 033 but gets rid of the rest.

 

So thank you - I learnt something new with nouniquekeySmiley Happy

Super User
Posts: 9,676

Re: NODUPKEY with PROC SORT

For me, I would like to use PROC SQL ,because it is simple.
Actually Data Step also could make it simple as SQL.


data have;
input id date code;
cards;
009 123195 03
009 101595 01
009 090995 07
033 101095 07
033 101095 03
033 090995 02
033 051894 01
033 010594 07
045 122195 06
045 103195 05
045 102195 05
;
run;

proc sort data=nodupkey_data;
by id descending date;
run;
data want;
 set have;
 by id descending date;
 if first.id then n=0;
 n+first.date;
 if n=1;
 drop n;
run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 475 views
  • 0 likes
  • 3 in conversation