DATA Step, Macro, Functions and more

Delete duplicate rows if two variables are not identical

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Delete duplicate rows if two variables are not identical

I want to keep only the duplicates where there is an identical match betwwen sport1 and sport2. So in the following table I only want to keep Person A where sport1 and Sport2 are both soccer and Person D where Sport1 and Sport2 are both Tennis

 

 

 

PersonSport1Sport2
ASoccerSoccer
ASoccerTable Tennis
ASoccerRugby
BFootyFooty
CBaseballBaseball
DTennisBadmington
DTennisTennis

Accepted Solutions
Solution
‎06-20-2017 09:23 PM
Super User
Posts: 6,946

Re: Delete duplicate rows if two variables are not identical

data want;
set have;
if sport1 = sport2;
run;

This is called a subsetting if.

You can do it in a where condition, if you want only one entry per person:

proc sort
  data=have (
    where=(sport1 = sport2)
  )
  out=want (keep=person)
  nodupkey
;
by person;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Solution
‎06-20-2017 09:23 PM
Super User
Posts: 6,946

Re: Delete duplicate rows if two variables are not identical

data want;
set have;
if sport1 = sport2;
run;

This is called a subsetting if.

You can do it in a where condition, if you want only one entry per person:

proc sort
  data=have (
    where=(sport1 = sport2)
  )
  out=want (keep=person)
  nodupkey
;
by person;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 9,682

Re: Delete duplicate rows if two variables are not identical

To be honest. I don't understand your question.

data have;
input (Person Sport1 Sport2) ($);
cards;
A Soccer Soccer 
A Soccer Table Tennis 
A Soccer Rugby 
B Footy Footy 
C Baseball Baseball 
D Tennis Badmington 
D Tennis Tennis 
;
run;
proc sql;
select *
 from have
  group by person
   having sum(sport1=sport2) ne count(*) and  sum(sport1=sport2) ne 0;
quit;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 115 views
  • 1 like
  • 3 in conversation