DATA Step, Macro, Functions and more

Remove ID's with duplicate observations on the first date

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

Remove ID's with duplicate observations on the first date

 

Hello all,

 

I had this situation where I need to exclude all the ID's with >1 sales type on the first sold date. My condition should be valid only for the first date for each unique ID. Any number of sales on same date other than first date should be taken into account. For example, in the below scenario there are multiple observations for each ID.  I want to delete those ID's which had duplicate observations on the first date only. 

 

IdsalesDate sold
1car1/1/2001
1car1/1/2001
1truck1/3/2001
2motorcycle1/5/2001
2truck1/8/2001
3bike1/4/2003
3motorcycle1/5/2003
3truck1/6/2003
3bike1/6/2003

 

 

The output should look like this: 

 

IDsalesdate
2motorcycle1/5/2001
2truck1/8/2001
3bike1/4/2003
3motorcycle1/5/2003
3truck1/6/2003
3bike1/6/2003

 

'ID-1' is deleted because it had more than 1 sales (same/different sales type) on the same date (first date). Though the 'ID-3' had more than 1 sales, its not deleted because the sales were not from the first date. 


Accepted Solutions
Solution
‎08-25-2016 05:20 PM
Super User
Posts: 9,682

Re: Remove ID's with duplicate observations on the first date

OK. Assuming the data has been sorted as you posted.

data have;
infile cards expandtabs truncover;
input Id	(sales	Datesold) (:$40.);
cards;
1	car	1/1/2001
1	car	1/1/2001
1	truck	1/3/2001
2	motorcycle	1/5/2001
2	truck	1/8/2001
3	bike	1/4/2003
3	motorcycle	1/5/2003
3	truck	1/6/2003
3	bike	1/6/2003
;
run;
data want;
n=0;count=0;
do until(last.id);
 set have;
 by id Datesold;
 if first.Datesold then n+1;
 if n=1 then count+1;
end;
do until(last.id);
 set have;
 by id Datesold;
 if count=1 then output;
end;
drop n count;
run;

View solution in original post


All Replies
Super User
Posts: 5,085

Re: Remove ID's with duplicate observations on the first date

It's not 100% clear what you are asking for, but it seems this is what you are after:

 

proc sort data=have;

by id date_sold;

run;

 

data want;

set have;

by id date_sold;

if first.id then do;

   if last.date_sold=0 then delete_me='Y';

   else delete_me='N';

end;

retain delete_me;

if delete_me='Y' then delete;

run;

 

Even if I didn't figure out the proper result here, these would likely be the right tools to be playing with to get a solution.

Solution
‎08-25-2016 05:20 PM
Super User
Posts: 9,682

Re: Remove ID's with duplicate observations on the first date

OK. Assuming the data has been sorted as you posted.

data have;
infile cards expandtabs truncover;
input Id	(sales	Datesold) (:$40.);
cards;
1	car	1/1/2001
1	car	1/1/2001
1	truck	1/3/2001
2	motorcycle	1/5/2001
2	truck	1/8/2001
3	bike	1/4/2003
3	motorcycle	1/5/2003
3	truck	1/6/2003
3	bike	1/6/2003
;
run;
data want;
n=0;count=0;
do until(last.id);
 set have;
 by id Datesold;
 if first.Datesold then n+1;
 if n=1 then count+1;
end;
do until(last.id);
 set have;
 by id Datesold;
 if count=1 then output;
end;
drop n count;
run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 189 views
  • 2 likes
  • 3 in conversation