Hi all,
I have a dataset looks like below. I want to keep the observations with the highest number in "count" variable by county and year.
County year Count
20001 2008 1
20001 2008 2
20001 2008 3
20001 2006 1
20002 2008 1
20002 2008 2
The outcome that I want has to look like...
County year Count
20001 2008 3
20001 2006 1
20002 2008 2
proc sort data=rc out=rc_dup nodupkey;
by county year;
run;
proc sort data=have;
by county year count;
run;
data want;
set have;
by county year;
if last.year;
run;
proc sort data=have;
by county year count;
run;
data want;
set have;
by county year;
if last.year;
run;
1. SORT approach would require 2 sorts:
data have;
input County year Count;
cards;
20001 2008 1
20001 2008 2
20001 2008 3
20001 2006 1
20002 2008 1
20002 2008 2
;
proc sort data=have out=temp;
by county year descending count;
run;
proc sort data=temp out=want nodupkey;
by county year;
run;
2.
data have;
input County year Count;
cards;
20001 2008 1
20001 2008 2
20001 2008 3
20001 2006 1
20002 2008 1
20002 2008 2
;
data want;
set have;
by county year notsorted;
if last.year;
run;
Assuming your dataset order is what it is as shown in your sample, you would need a NOTSORTED option in the BY statement in your datastep
3. Proc SQL grouped filter using HAVING clause
proc sql;
create table want as
select *
from have
group by county,year
having count=max(count);
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.