DATA Step, Macro, Functions and more

need help to remove duplicates

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 115
Accepted Solution

need help to remove duplicates

subject id text start date time  STOP DATE
39 Only in 1A nature 09 Oct 2014 07:15  
39 Only in 1B beauty UN Mar 2015   10 Sep 2015
45 Only in 1A HUMAN 09 Oct 2014    

 

I need to remove subjects which has stop date in "Only in 1B" if the same subject has "Only in 1A" observation also.

 

 


Accepted Solutions
Solution
‎01-18-2017 07:25 AM
Super User
Posts: 6,936

Re: need help to remove duplicates

[ Edited ]
data delfile (keep=subject);
set have;
by subject;
retain delflag1 delflag2;
if first.subject
then do;
  delflag1 = 0;
  delflag2 = 0;
end;
if id = 'Only in 1A' then delflag1 = 1;
if id = 'Only in 1B' and stop_date ne . then delflag2 = 1;
if last.subject and delflag1 and delflag2 then output;
run;

data want;
merge
  have (in=a)
  delfile (in=b)
;
by subject;
if a and not b;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Trusted Advisor
Posts: 1,374

Re: need help to remove duplicates

Assumed data is sorted by subject id:

 

data want;

  set have;

       by suject id;

            retain flag_remove;

           if first.id then flag_remove=0;

           if id = '1A' than flag_remove = 1;

           if id = '1B' and flag_remove=1 then delete;

run;

Frequent Contributor
Posts: 115

Re: need help to remove duplicates

should it be if first.subject then flag_remove=0; ??

Frequent Contributor
Posts: 115

Re: need help to remove duplicates

But in this case i need to check if stop date is present for only 1B and then delete.

Super User
Posts: 5,081

Re: need help to remove duplicates

Do you need to remove ALL the observations for that subject, or just the 1B observations?

Frequent Contributor
Posts: 115

Re: need help to remove duplicates

If for the same subject i have both 1A and 1B and i have stop date for 1B  and missing stop date for 1A then only i need to remove 1B obs. else keep as it is.

Super User
Posts: 6,936

Re: need help to remove duplicates


vraj1 wrote:

If for the same subject i have both 1A and 1B and i have stop date for 1B  and missing stop date for 1A then only i need to remove 1B obs. else keep as it is.


So you just moved the goalposts.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Solution
‎01-18-2017 07:25 AM
Super User
Posts: 6,936

Re: need help to remove duplicates

[ Edited ]
data delfile (keep=subject);
set have;
by subject;
retain delflag1 delflag2;
if first.subject
then do;
  delflag1 = 0;
  delflag2 = 0;
end;
if id = 'Only in 1A' then delflag1 = 1;
if id = 'Only in 1B' and stop_date ne . then delflag2 = 1;
if last.subject and delflag1 and delflag2 then output;
run;

data want;
merge
  have (in=a)
  delfile (in=b)
;
by subject;
if a and not b;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 115

Re: need help to remove duplicates

i get NOTE: Variable last.id is uninitialized.

Super User
Posts: 6,936

Re: need help to remove duplicates


vraj1 wrote:

i get NOTE: Variable last.id is uninitialized.


Corrected my code from last.id to last.subject.

Happens when one has to work without test data.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 115

Re: need help to remove duplicates

Thanks a lot Kurt
Frequent Contributor
Posts: 115

Re: need help to remove duplicates

Sorry for asking but in this case i am loosing those subjects whereas i need the subject with only 1A observation and remove 1B obs in the case of stop date present in 1B and not in 1A. Sorry for all the confusion
Super User
Posts: 6,936

Re: need help to remove duplicates

So, for a given subject, you want to remove the "1B" observation if

- the "1B" observation has a stop_date

- a previous "1A" observation had no stop_date

?

OTOH, the observation would stay when either

- no previous "1A" observation is present

- a previous "1A" observation does not have a stop_date

- the "1B" observation has no stop_date

?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 115

Re: need help to remove duplicates

remove the "1B" observation if

- the "1B" observation has a stop_date

- a previous "1A" observation had no stop_date

 

If there is only one onservation for a subject with either 1A or 1B it stays as it is.

if there is stop date for 1A and stop date for 1B then both stays.

If there is no stop date for 1B then also it stays

Super User
Posts: 6,936

Re: need help to remove duplicates

data want;
set have;
by subject;
retain delflag;
if first.subject then delflag = 0;
if id = 'Only in 1A' and stop_date = . then delflag = 1;
if id = 'Only in 1B' and stop_date ne . and delflag then delete;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 200 views
  • 1 like
  • 4 in conversation