keep rows for an ID based on a condition

d0816 · Posted 08-28-2019 04:44 PM

Hi,

I have following dataset that I sorted by first Begindate and then Entrydate. If ID has same Begin Date in multiple rows, I want to keep only first row with that begin date. for e.g. for ID 1, I want to keep first and third row. For ID 2, keep both rows because Begin date is not same. For ID 3, keep first row because Begin date is same in both rows. For ID 4, keep the row, because its the only row. (EDIT: There are other variables besides these.)

Dataset Have:

ID	BeginDate_1	EntryDate_2
1	4/8/2019	4/15/2019
1	4/8/2019	5/14/2019
1	4/30/2019	8/7/2019
2	5/20/2019	8/13/2019
2	6/3/2019	6/4/2019
3	4/15/2019	5/28/2019
3	4/15/2019	7/17/2019
4	5/20/2019	6/20/2019

So the resulting dataset will look like this:

ID	BeginDate_1	EntryDate_2
1	4/8/2019	4/15/2019
1	4/30/2019	8/7/2019
2	5/20/2019	8/13/2019
2	6/3/2019	6/4/2019
3	4/15/2019	5/28/2019
4	5/20/2019	6/20/2019

Can anyone help me with this?

Thank you.

Patrick · Posted 08-28-2019 04:50 PM

Your source data needs to be sorted by ID, BeginDate_1, EntryDate_2.

You then can get your desired result via a data step like:

data want;
   set have;
   by id BeginDate_1;
   if first.BeginDate_1;
......
run;

Jagadishkatam · Posted 08-28-2019 04:54 PM

Alternatively use proc sort

proc sort data=have out=want nodupkey;
by id begindate_1;
run;

Thanks,
Jag

r_behata · Posted 08-28-2019 05:07 PM

data have;
input ID $	BeginDate_1	anydtdte. EntryDate_2 mmddyy10.;
infile cards truncover;
format BeginDate_1 EntryDate_2 date9.;
cards;
1 4/8/2019 4/15/2019
1 4/8/2019 5/14/2019
1 4/30/2019 8/7/2019
2 5/20/2019 8/13/2019
2 6/3/2019 6/4/2019
3 4/15/2019 5/28/2019
3 4/15/2019 7/17/2019
4 5/20/2019 6/20/2019
run;

data want;
	do until(last.BeginDate_1);
		set have;
			by id BeginDate_1;

	end;
run;

d0816 · Posted 08-29-2019 10:14 AM

Made a minor change: first.BeginDate_1 instead of last.BeginDate_1 in the following code. Code worked in the sample dataset.

data want;
do until(last.BeginDate_1);
set have;
by id BeginDate_1;
end;
run;

Thank you.

novinosrin · Posted 08-28-2019 05:18 PM

data have;
input ID $	BeginDate_1	anydtdte. EntryDate_2 mmddyy10.;
infile cards truncover;
format BeginDate_1 EntryDate_2 date9.;
cards;
1 4/8/2019 4/15/2019
1 4/8/2019 5/14/2019
1 4/30/2019 8/7/2019
2 5/20/2019 8/13/2019
2 6/3/2019 6/4/2019
3 4/15/2019 5/28/2019
3 4/15/2019 7/17/2019
4 5/20/2019 6/20/2019
run;

proc sql;
create table want as
select distinct *
from have
group by id,BeginDate_1
having EntryDate_2=min(EntryDate_2);
quit;

d0816 · Posted 08-29-2019 10:10 AM

@novinosrin @r_behata @Jagadishkatam @Patrick

All codes worked in the sample dataset.

But when I applied this in my original dataset, the resulting no. of obs with codes from @r_behata @Jagadishkatam @Patrick were 3187 obs but @novinosrin's code gave 3198 obs. (11 extra obs.)

How can I check which extra observations are in resulting dataset from @novinosrin, so that I can check for the extra IDs in these extra observations?

Thank you.

novinosrin · Posted 08-29-2019 10:35 AM

Possibly you have duplicates of

EntryDate_2

and the logic also is dependent upon other variables in the dataset that's causing even the distinct to not work

I recommend go with a Datastep option posted by others. The reason I participated in the thread is to see another method of doing that wasn't posted.

d0816 · Posted 08-29-2019 10:47 AM

@novinosrin I may not have been able to create sample dataset that completely represented original dataset and there may be cases like the one you pointed out (also both dates did not have missing values). So wanted to check the particular IDs and Obs. that showed up with Proc SQL. I will go with your suggestion.

Thank you so much.

novinosrin · Posted 08-29-2019 10:53 AM

Hi @d0816 I generally tend to use MIN, MAX operator and functions when I know there are Date/Datetime values that I have to deal with as opposed to FIRST and LAST if and when possible giving some precedence to this approach(personal) of mine. The reason is that it gives me greater control and I wanna know what the heck is my code doing in my imagination. I guess this practice is a thing that stuck to me because I work with Transaction data set for the most part.

chihab · Posted 07-28-2020 08:18 AM

proc sort data = Have nodupkey out= HAVE1 ; by ID begindate1 ; run ;

keep rows for an ID based on a condition

Re: keep rows for an ID based on a condition

Re: keep rows for an ID based on a condition

Re: keep rows for an ID based on a condition

Re: keep rows for an ID based on a condition

Re: keep rows for an ID based on a condition

Re: keep rows for an ID based on a condition

Re: keep rows for an ID based on a condition

Re: keep rows for an ID based on a condition

Re: keep rows for an ID based on a condition

Re: keep rows for an ID based on a condition

SAS Innovate 2025: Call for Content

Classroom Training Available!