BookmarkSubscribeRSS Feed
d0816
Quartz | Level 8

Hi,

I have following dataset that I sorted by first Begindate and then Entrydate. If ID has same Begin Date in multiple rows, I want to keep only first row with that begin date. for e.g. for ID 1, I want to keep first and third row. For ID 2, keep both rows because Begin date is not same. For ID 3, keep first row because Begin date is same in both rows. For ID 4, keep the row, because its the only row. (EDIT: There are other variables besides these.)

Dataset Have:

IDBeginDate_1EntryDate_2
14/8/20194/15/2019
14/8/20195/14/2019
14/30/20198/7/2019
25/20/20198/13/2019
26/3/20196/4/2019
34/15/20195/28/2019
34/15/20197/17/2019
45/20/20196/20/2019

 

So the resulting dataset will look like this:

IDBeginDate_1EntryDate_2
14/8/20194/15/2019
14/30/20198/7/2019
25/20/20198/13/2019
26/3/20196/4/2019
34/15/20195/28/2019
45/20/20196/20/2019

 

Can anyone help me with this?

 

Thank you.

10 REPLIES 10
Patrick
Opal | Level 21

Your source data needs to be sorted by ID, BeginDate_1, EntryDate_2.

You then can get your desired result via a data step like:

data want;
   set have;
   by id BeginDate_1;
   if first.BeginDate_1;
......
run;
Jagadishkatam
Amethyst | Level 16

Alternatively use proc sort

 

proc sort data=have out=want nodupkey;
by id begindate_1;
run;
Thanks,
Jag
r_behata
Barite | Level 11
data have;
input ID $	BeginDate_1	anydtdte. EntryDate_2 mmddyy10.;
infile cards truncover;
format BeginDate_1 EntryDate_2 date9.;
cards;
1 4/8/2019 4/15/2019
1 4/8/2019 5/14/2019
1 4/30/2019 8/7/2019
2 5/20/2019 8/13/2019
2 6/3/2019 6/4/2019
3 4/15/2019 5/28/2019
3 4/15/2019 7/17/2019
4 5/20/2019 6/20/2019
run;

data want;
	do until(last.BeginDate_1);
		set have;
			by id BeginDate_1;

	end;
run;
d0816
Quartz | Level 8

Made a minor change: first.BeginDate_1  instead of last.BeginDate_1 in the following code. Code worked in the sample dataset.

 

data want;
do until(last.BeginDate_1);
set have;
by id BeginDate_1;
end;
run;

 

Thank you.

novinosrin
Tourmaline | Level 20
data have;
input ID $	BeginDate_1	anydtdte. EntryDate_2 mmddyy10.;
infile cards truncover;
format BeginDate_1 EntryDate_2 date9.;
cards;
1 4/8/2019 4/15/2019
1 4/8/2019 5/14/2019
1 4/30/2019 8/7/2019
2 5/20/2019 8/13/2019
2 6/3/2019 6/4/2019
3 4/15/2019 5/28/2019
3 4/15/2019 7/17/2019
4 5/20/2019 6/20/2019
run;

proc sql;
create table want as
select distinct *
from have
group by id,BeginDate_1
having EntryDate_2=min(EntryDate_2);
quit;
d0816
Quartz | Level 8

@novinosrin @r_behata @Jagadishkatam @Patrick 

All codes worked in the sample dataset.

 

But when I applied this in my original dataset, the resulting no. of obs with codes from @r_behata @Jagadishkatam @Patrick were 3187 obs but @novinosrin's code gave 3198 obs. (11 extra obs.)

 

How can I check which extra observations are in resulting dataset from @novinosrin, so that I can check for the extra IDs in these extra observations?

 

Thank you.

 

 

novinosrin
Tourmaline | Level 20

Possibly you have duplicates of 

EntryDate_2 

and the logic also is dependent upon other variables in the dataset that's causing even the distinct to not work

 

I recommend go with a Datastep option posted by others. The reason I participated in the thread is to see another method of doing that wasn't posted.  

d0816
Quartz | Level 8

@novinosrin I may not have been able to create sample dataset that completely represented original dataset and there may be cases like the one you pointed out (also both dates did not have missing values). So wanted to check the particular IDs and Obs. that showed up with Proc SQL. I will go with your suggestion.

 

Thank you so much.

novinosrin
Tourmaline | Level 20

Hi @d0816  I generally tend to use MIN, MAX operator and functions when I know there are Date/Datetime values that I have to deal with as opposed to FIRST and LAST if and when possible giving some precedence to this approach(personal) of mine. The reason is that it gives me greater control and I wanna know what the heck is my code doing in my imagination. I guess this practice is a thing that stuck to me because I work with Transaction data set for the most part. 

chihab
Calcite | Level 5

proc sort data = Have nodupkey out= HAVE1 ; by ID  begindate1 ; run ; 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3505 views
  • 2 likes
  • 6 in conversation