- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, i have the table PROC SORT by ID then Descending Date.
However ID 033 has 2 instances with the same date and NODUPKEY only picks up the 1st instance.
Is it possible to pick up ID 033's 2 instances?
My idea is to pick up each ID's latest date.
Thank you.
ID | DATE | CODE |
009 | 123195 | 03 |
009 | 101595 | 01 |
009 | 090995 | 07 |
033 | 101095 | 07 |
033 | 101095 | 03 |
033 | 090995 | 02 |
033 | 051894 | 01 |
033 | 010594 | 07 |
045 | 122195 | 06 |
045 | 103195 | 05 |
045 | 102195 | 05 |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are so many ways to skin this particular cat (sorry, Lily), but nodupkey isn't one of them.
It could have been done in one pass with SQL, but here's a method with proc sort and two data steps:
data nodupkey_data;
infile cards dsd dlm='09'x;
attrib id length=3 format=z3.;
attrib date informat=mmddyy6. format=yymmdd10.;
attrib code length=$ 2;
input id
date
code;
cards;
009 123195 03
009 101595 01
009 090995 07
033 101095 07
033 101095 03
033 090995 02
033 051894 01
033 010594 07
045 122195 06
045 103195 05
045 102195 05
;
run;
proc sort data=nodupkey_data;
by id date;
run;
/*
For each id, get the most recent date */
data maxdate;
set nodupkey_data;
by id;
if last.id;
keep id date;
run;
/*
Merge back in with the main dataset, but only retain those observations
where the date is the same as the most recent.
*/
data id_latest_date;
merge nodupkey_data
maxdate(in=in_maxdate);
by id date;
if in_maxdate;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are so many ways to skin this particular cat (sorry, Lily), but nodupkey isn't one of them.
It could have been done in one pass with SQL, but here's a method with proc sort and two data steps:
data nodupkey_data;
infile cards dsd dlm='09'x;
attrib id length=3 format=z3.;
attrib date informat=mmddyy6. format=yymmdd10.;
attrib code length=$ 2;
input id
date
code;
cards;
009 123195 03
009 101595 01
009 090995 07
033 101095 07
033 101095 03
033 090995 02
033 051894 01
033 010594 07
045 122195 06
045 103195 05
045 102195 05
;
run;
proc sort data=nodupkey_data;
by id date;
run;
/*
For each id, get the most recent date */
data maxdate;
set nodupkey_data;
by id;
if last.id;
keep id date;
run;
/*
Merge back in with the main dataset, but only retain those observations
where the date is the same as the most recent.
*/
data id_latest_date;
merge nodupkey_data
maxdate(in=in_maxdate);
by id date;
if in_maxdate;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. Appreciated.
I had thought of NODUPKEY as a quick solution as I assumed there is no 2 or more IDs with the same date.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It comes down to understanding what nodupkey does (and, as I just found out, nouniquekey!).
The former detects multiple instances of duplicate keys and only keeps one of them. You have to be careful using this if you want to control which one is kept - if you do care, I recommend not using it; instead do a sort and then control which one in a data step.
The latter only keeps duplicates. Running a sort with your data keeps the two observations you want for 033 but gets rid of the rest.
So thank you - I learnt something new with nouniquekey.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For me, I would like to use PROC SQL ,because it is simple. Actually Data Step also could make it simple as SQL. data have; input id date code; cards; 009 123195 03 009 101595 01 009 090995 07 033 101095 07 033 101095 03 033 090995 02 033 051894 01 033 010594 07 045 122195 06 045 103195 05 045 102195 05 ; run; proc sort data=nodupkey_data; by id descending date; run; data want; set have; by id descending date; if first.id then n=0; n+first.date; if n=1; drop n; run;