BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Agent1592
Pyrite | Level 9

Dear SAS Community:

I have certain data. Basically it is panel data with company_id, year and quarter (qtr). I would like to remove duplicate records. 

I know I can use the following command:

proc sort data=test nodupkey out=test 1;
by year qtr;
run;

However this randomly removes duplicates. I have certain variables (var1, var2, var3, and var4 that are percentages and I would like to keep the maximum values for each year-quarter for var3 and var4. So for example in 1999 I have two different rows- one for which var3 and var4 are equal to 0.5 and another one for which var3 and var4 are equal to 1. How can I keep the one with the higher value? 

 

 

data work.TEST;
  infile datalines dsd truncover;
  input company_id:BEST12. year:32. qtr:32. var1:32. var2:32. var3:32. var4:32.;
  format company_id BEST12.;
datalines4;
11334,1992,2,0,0,0,0
11334,1992,3,0,0,0,0
11334,1992,3,0,0,0,0
11334,1992,3,0,0,0,0
11334,1992,4,0,0,0,0
11334,1994,2,0,0,0,0
11334,1994,3,0,0,0,0
11334,1994,3,0,0,0,0
11334,1994,3,0,0,0,0
11334,1994,4,0,0,0,0
11334,1995,2,0,0,0,0
11334,1995,3,0,0,0,0
11334,1995,3,0,0,0,0
11334,1995,3,0,0,0,0
11334,1995,4,0,0,0,0
11334,1996,2,0,0,0,0
11334,1996,3,0,0,0,0
11334,1996,3,0,0,0,0
11334,1996,3,0,0,0,0
11334,1996,4,0,0,0,0
11334,1997,2,0,0,0,0
11334,1997,3,0,0,0,0
11334,1997,3,0,0,0,0
11334,1997,3,0,0,0,0
11334,1997,4,0,0,0,0
11334,1998,2,0,0,0,0
11334,1998,3,0,0,0,0
11334,1998,3,0,0,0,0
11334,1998,3,0,0,0,0
11334,1998,4,0,0,0,0
11334,1999,2,0.5,0.5,0.5,0.5
11334,1999,2,1,1,1,1
11334,1999,3,0.5,0.5,0.5,0.5
11334,1999,3,1,1,1,1
11334,1999,3,0.5,0.5,0.5,0.5
11334,1999,3,1,1,1,1
11334,1999,3,0.5,0.5,0.5,0.5
11334,1999,3,1,1,1,1
11334,1999,4,0.5,0.5,0.5,0.5
11334,1999,4,1,1,1,1
11334,2000,2,0,0,0,0
11334,2000,2,0,0,0,0
11334,2000,3,0,0,0,0
11334,2000,3,0,0,0,0
11334,2000,3,0,0,0,0
11334,2000,3,0,0,0,0
11334,2000,3,0,0,0,0
11334,2000,3,0,0,0,0
11334,2000,4,0,0,0,0
11334,2000,4,0,0,0,0
11334,2001,2,0,0,0,0
11334,2001,2,0,0,0,0
11334,2001,3,0,0,0,0
11334,2001,3,0,0,0,0
11334,2001,3,0,0,0,0
11334,2001,3,0,0,0,0
11334,2001,3,0,0,0,0
11334,2001,3,0,0,0,0
11334,2001,4,0,0,0,0
11334,2001,4,0,0,0,0
11334,2002,2,0,0,0,0
11334,2002,2,0,0,0,0
11334,2002,3,0,0,0,0
11334,2002,3,0,0,0,0
11334,2002,3,0,0,0,0
11334,2002,3,0,0,0,0
11334,2002,3,0,0,0,0
11334,2002,3,0,0,0,0
11334,2002,4,0,0,0,0
11334,2002,4,0,0,0,0
11334,2003,2,1,1,1,1
11334,2003,3,1,1,1,1
11334,2003,3,1,1,1,1
11334,2003,3,1,1,1,1
11334,2003,4,1,1,1,1
11334,2004,2,0,0,0,0
11334,2004,3,0,0,0,0
11334,2004,3,0,0,0,0
11334,2004,3,0,0,0,0
11334,2004,4,0,0,0,0
11334,2005,2,0,0,0,0
11334,2005,3,0,0,0,0
11334,2005,3,0,0,0,0
11334,2005,3,0,0,0,0
11334,2005,4,0,0,0,0
11334,2006,2,0,0,0,0
11334,2006,3,0,0,0,0
11334,2006,3,0,0,0,0
11334,2006,3,0,0,0,0
11334,2006,4,0,0,0,0
11334,2007,2,0,0,0,0
11334,2007,3,0,0,0,0
11334,2007,3,0,0,0,0
11334,2007,3,0,0,0,0
11334,2007,4,0,0,0,0
11334,2008,2,0,0,0,0
11334,2008,3,0,0,0,0
11334,2008,3,0,0,0,0
11334,2008,3,0,0,0,0
11334,2008,4,0,0,0,0
11334,2009,2,0,0,0,0
11334,2009,3,0,0,0,0
11334,2009,3,0,0,0,0
11334,2009,3,0,0,0,0
11334,2009,4,0,0,0,0
11334,2011,2,0,0.5,0.5,0.5
11334,2011,3,0,0.5,0.5,0.5
11334,2011,3,0,0.5,0.5,0.5
11334,2011,3,0,0.5,0.5,0.5
11334,2011,4,0,0.5,0.5,0.5
11334,2013,2,0.5,0.5,0.5,0.5
11334,2013,3,0.5,0.5,0.5,0.5
11334,2013,3,0.5,0.5,0.5,0.5
11334,2013,3,0.5,0.5,0.5,0.5
11334,2013,4,0.5,0.5,0.5,0.5
11334,2014,2,0,0,0,0
11334,2014,3,0,0,0,0
11334,2014,3,0,0,0,0
11334,2014,3,0,0,0,0
11334,2014,4,0,0,0,0
11334,2016,2,0,0,0,0
11334,2016,3,0,0,0,0
11334,2016,3,0,0,0,0
11334,2016,3,0,0,0,0
11334,2016,4,0,0,0,0
;;;;
1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Here is a hash approach that does not require sorting.

 

data _null_;
   if _N_ = 1 then do;
      declare hash h(dataset:'test(obs=0)', ordered:'A');
      h.defineKey('company_id', 'year', 'qtr');
      h.defineData(all:'Y');
      h.defineDone();
   end;

   set test end=eof;

   if h.check() ne 0 then h.add();
   else do;
      _var1=var1;_var2=var2;_var3=var3;_var4=var4;
      rc=h.find();
      if var3 <= _var3 then do;
         var1=_var1;var2=_var2;var3=_var3;var4=_var4;
         h.replace();
      end;
   end;

   if eof then h.output(dataset:'want');
run;

@Kurt_Bremsers and my solution generates the exact same data set though 🙂 

View solution in original post

6 REPLIES 6
Kurt_Bremser
Super User

Sort by company_id, year, quarter and the selective variable, then use last. to extract the higher value:

proc sort data=have;
by company_id year quarter var3;
run;

data want;
set have;
by company_id year quarter;
if last.quarter;
run;
PeterClemmensen
Tourmaline | Level 20

Here is a hash approach that does not require sorting.

 

data _null_;
   if _N_ = 1 then do;
      declare hash h(dataset:'test(obs=0)', ordered:'A');
      h.defineKey('company_id', 'year', 'qtr');
      h.defineData(all:'Y');
      h.defineDone();
   end;

   set test end=eof;

   if h.check() ne 0 then h.add();
   else do;
      _var1=var1;_var2=var2;_var3=var3;_var4=var4;
      rc=h.find();
      if var3 <= _var3 then do;
         var1=_var1;var2=_var2;var3=_var3;var4=_var4;
         h.replace();
      end;
   end;

   if eof then h.output(dataset:'want');
run;

@Kurt_Bremsers and my solution generates the exact same data set though 🙂 

Sajid01
Meteorite | Level 14

SAS gives multiple approaches to the same issue.

This is mine

proc sql;
create table work.test2 as
select year,qtr , Max(var3) as var3, Max(var4) as var4 from work.test
group by year, qtr
order by year, qtr;
quit;

Reeza
Super User
If you want the MAX for each value for a particular grouping, PROC MEANS/SUMMARY is the most efficient method.

proc means data=have noprint Nway;
class company year qtr;
output out=want max = / autoname autolabel;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1940 views
  • 5 likes
  • 5 in conversation