Hi,
I have following dataset that I sorted by first Begindate and then Entrydate. If ID has same Begin Date in multiple rows, I want to keep only first row with that begin date. for e.g. for ID 1, I want to keep first and third row. For ID 2, keep both rows because Begin date is not same. For ID 3, keep first row because Begin date is same in both rows. For ID 4, keep the row, because its the only row. (EDIT: There are other variables besides these.)
Dataset Have:
ID | BeginDate_1 | EntryDate_2 |
1 | 4/8/2019 | 4/15/2019 |
1 | 4/8/2019 | 5/14/2019 |
1 | 4/30/2019 | 8/7/2019 |
2 | 5/20/2019 | 8/13/2019 |
2 | 6/3/2019 | 6/4/2019 |
3 | 4/15/2019 | 5/28/2019 |
3 | 4/15/2019 | 7/17/2019 |
4 | 5/20/2019 | 6/20/2019 |
So the resulting dataset will look like this:
ID | BeginDate_1 | EntryDate_2 |
1 | 4/8/2019 | 4/15/2019 |
1 | 4/30/2019 | 8/7/2019 |
2 | 5/20/2019 | 8/13/2019 |
2 | 6/3/2019 | 6/4/2019 |
3 | 4/15/2019 | 5/28/2019 |
4 | 5/20/2019 | 6/20/2019 |
Can anyone help me with this?
Thank you.
Your source data needs to be sorted by ID, BeginDate_1, EntryDate_2.
You then can get your desired result via a data step like:
data want;
set have;
by id BeginDate_1;
if first.BeginDate_1;
......
run;
Alternatively use proc sort
proc sort data=have out=want nodupkey;
by id begindate_1;
run;
data have;
input ID $ BeginDate_1 anydtdte. EntryDate_2 mmddyy10.;
infile cards truncover;
format BeginDate_1 EntryDate_2 date9.;
cards;
1 4/8/2019 4/15/2019
1 4/8/2019 5/14/2019
1 4/30/2019 8/7/2019
2 5/20/2019 8/13/2019
2 6/3/2019 6/4/2019
3 4/15/2019 5/28/2019
3 4/15/2019 7/17/2019
4 5/20/2019 6/20/2019
run;
data want;
do until(last.BeginDate_1);
set have;
by id BeginDate_1;
end;
run;
Made a minor change: first.BeginDate_1 instead of last.BeginDate_1 in the following code. Code worked in the sample dataset.
data want;
do until(last.BeginDate_1);
set have;
by id BeginDate_1;
end;
run;
Thank you.
data have;
input ID $ BeginDate_1 anydtdte. EntryDate_2 mmddyy10.;
infile cards truncover;
format BeginDate_1 EntryDate_2 date9.;
cards;
1 4/8/2019 4/15/2019
1 4/8/2019 5/14/2019
1 4/30/2019 8/7/2019
2 5/20/2019 8/13/2019
2 6/3/2019 6/4/2019
3 4/15/2019 5/28/2019
3 4/15/2019 7/17/2019
4 5/20/2019 6/20/2019
run;
proc sql;
create table want as
select distinct *
from have
group by id,BeginDate_1
having EntryDate_2=min(EntryDate_2);
quit;
@novinosrin @r_behata @Jagadishkatam @Patrick
All codes worked in the sample dataset.
But when I applied this in my original dataset, the resulting no. of obs with codes from @r_behata @Jagadishkatam @Patrick were 3187 obs but @novinosrin's code gave 3198 obs. (11 extra obs.)
How can I check which extra observations are in resulting dataset from @novinosrin, so that I can check for the extra IDs in these extra observations?
Thank you.
Possibly you have duplicates of
EntryDate_2
and the logic also is dependent upon other variables in the dataset that's causing even the distinct to not work
I recommend go with a Datastep option posted by others. The reason I participated in the thread is to see another method of doing that wasn't posted.
@novinosrin I may not have been able to create sample dataset that completely represented original dataset and there may be cases like the one you pointed out (also both dates did not have missing values). So wanted to check the particular IDs and Obs. that showed up with Proc SQL. I will go with your suggestion.
Thank you so much.
Hi @d0816 I generally tend to use MIN, MAX operator and functions when I know there are Date/Datetime values that I have to deal with as opposed to FIRST and LAST if and when possible giving some precedence to this approach(personal) of mine. The reason is that it gives me greater control and I wanna know what the heck is my code doing in my imagination. I guess this practice is a thing that stuck to me because I work with Transaction data set for the most part.
proc sort data = Have nodupkey out= HAVE1 ; by ID begindate1 ; run ;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.