DATA Step, Macro, Functions and more

delete corresponding rows based on conditions

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

delete corresponding rows based on conditions

I want to delete duplicates where ID and program_id are matching. But I want to keep the duplicate with the most recent collection_yr. For example, if two observations have the same ID and program_id but the completion_yr is 2009 and 2010, I want to keep the observation with the 2010 completion_yr. I have attached an example data set with duplicates. I want to delete the observation with ID=4 and program_id=9034 but collection_yr=2009 as that is not the most recent. However, I want to do this on a larger scale where there are more observations

Cheers

 

IDprogram_idCOLLECTION_YR
19992009
24442010
3SRC2009
490342009
490342010
490692010
577672006
6309992009

Accepted Solutions
Solution
‎05-29-2017 03:18 AM
Super User
Posts: 17,836

Re: delete corresponding rows based on conditions

[ Edited ]

Sort with the NODUPKEY option will get you there. 

 

Proc sort data = have out=have_sorted;
By Id programID descending year; *puts year as first;
Run;

Proc sort data = have nodupkey out=unique1;
By Id programID;
Run;

Or a SQL solution 

 

proc sql:
Create table unique2 as
Select Id, programID, max(year) as year
From have
Group by Id, programID;
Quit;

And one more - data step - note this does require the sort from the first piece of code. 

Data unique3;
Set have_sorted;
By Id programID;

If first.programID;

Run;

 


BenBrady wrote:

I want to delete duplicates where ID and program_id are matching. But I want to keep the duplicate with the most recent collection_yr. For example, if two observations have the same ID and program_id but the completion_yr is 2009 and 2010, I want to keep the observation with the 2010 completion_yr. I have attached an example data set with duplicates. I want to delete the observation with ID=4 and program_id=9034 but collection_yr=2009 as that is not the most recent. However, I want to do this on a larger scale where there are more observations

Cheers

 

ID program_id COLLECTION_YR
1 999 2009
2 444 2010
3 SRC 2009
4 9034 2009
4 9034 2010
4 9069 2010
5 7767 2006
6 30999 2009


 

View solution in original post


All Replies
Solution
‎05-29-2017 03:18 AM
Super User
Posts: 17,836

Re: delete corresponding rows based on conditions

[ Edited ]

Sort with the NODUPKEY option will get you there. 

 

Proc sort data = have out=have_sorted;
By Id programID descending year; *puts year as first;
Run;

Proc sort data = have nodupkey out=unique1;
By Id programID;
Run;

Or a SQL solution 

 

proc sql:
Create table unique2 as
Select Id, programID, max(year) as year
From have
Group by Id, programID;
Quit;

And one more - data step - note this does require the sort from the first piece of code. 

Data unique3;
Set have_sorted;
By Id programID;

If first.programID;

Run;

 


BenBrady wrote:

I want to delete duplicates where ID and program_id are matching. But I want to keep the duplicate with the most recent collection_yr. For example, if two observations have the same ID and program_id but the completion_yr is 2009 and 2010, I want to keep the observation with the 2010 completion_yr. I have attached an example data set with duplicates. I want to delete the observation with ID=4 and program_id=9034 but collection_yr=2009 as that is not the most recent. However, I want to do this on a larger scale where there are more observations

Cheers

 

ID program_id COLLECTION_YR
1 999 2009
2 444 2010
3 SRC 2009
4 9034 2009
4 9034 2010
4 9069 2010
5 7767 2006
6 30999 2009


 

PROC Star
Posts: 7,363

Re: delete corresponding rows based on conditions

One way is to use two proc sorts. e.g.:

proc sort data=have;
  by ID program_id descending COLLECTION_YR;
run;

proc sort data=have out=want nodupkey;
  by ID program_id;
run;

Art, CEO, AnalystFinder.com

 

Super User
Posts: 10,500

Re: delete corresponding rows based on conditions

In a pedantic feeling mode: If two records have some values the same and at least one value different they are not "duplicates" they are similar.

 

So you are selecting between similar records.

Valued Guide
Posts: 797

Re: delete corresponding rows based on conditions

If your data are aleady sorted by id /program_id/collection_yr:

 

data want;
  set have;
  by id program_id collection_yr;
  if last.program_id;
run;

 

If your data are not sorted, and sorting is expensive (i.e. it's a big data set), there is always proc summary:

proc summary data=have nway;
  class id program_id;
  var collection_yr;
  output out=want2 (drop=_type_ _freq_) max=;
run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 243 views
  • 3 likes
  • 5 in conversation