🔒 This topic is solved and locked.
Need further help from the community? Please
sign in and ask a new question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 01-20-2020 01:08 PM
(572 views)
Hi all,
I have a dataset looks like below. I want to keep the observations with the highest number in "count" variable by county and year.
County year Count
20001 2008 1
20001 2008 2
20001 2008 3
20001 2006 1
20002 2008 1
20002 2008 2
The outcome that I want has to look like...
County year Count
20001 2008 3
20001 2006 1
20002 2008 2
proc sort data=rc out=rc_dup nodupkey;
by county year;
run;
1 ACCEPTED SOLUTION
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sort data=have;
by county year count;
run;
data want;
set have;
by county year;
if last.year;
run;
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sort data=have;
by county year count;
run;
data want;
set have;
by county year;
if last.year;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1. SORT approach would require 2 sorts:
data have;
input County year Count;
cards;
20001 2008 1
20001 2008 2
20001 2008 3
20001 2006 1
20002 2008 1
20002 2008 2
;
proc sort data=have out=temp;
by county year descending count;
run;
proc sort data=temp out=want nodupkey;
by county year;
run;
2.
data have;
input County year Count;
cards;
20001 2008 1
20001 2008 2
20001 2008 3
20001 2006 1
20002 2008 1
20002 2008 2
;
data want;
set have;
by county year notsorted;
if last.year;
run;
Assuming your dataset order is what it is as shown in your sample, you would need a NOTSORTED option in the BY statement in your datastep
3. Proc SQL grouped filter using HAVING clause
proc sql;
create table want as
select *
from have
group by county,year
having count=max(count);
quit;