Hi all SAS experts!
My dataset looks something like this:
Unique ID Bacteria date:
110 1 s.aureus 1/12 2000
111 1 e.coli 1/12 2000
112 1 e.coli 1/9 2000
113 1 strep 3/10 2004
114 2 s.aureus 1/3 2010
115 2 e.coli 5/3 2010
116 2 pseudomonas 7/3 2010
170 3 pseudomonas 5/10 2005
175 3 e.coli 5/10 2005
176 3 enterobac 5/10 2005
180 3 s.aureus 1/10 2006
190 3 s.aureus 6/12 2006
How can I make SAS to select the first observation from the same ID of the same date only? With this example I would like to for an example keep for ID=1 the unique variable 110 and 111 as they are from the same date but get rid of the other samples with the same ID=1.
Hope you guys can help!
Kind regards
GK
@GK5 Hi and welcome to the SAS Community 🙂
Here is one way
data have;
input Unique ID Bacteria :$20. date :ddmmyy10.;
format date ddmmyy10.;
datalines;
110 1 s.aureus 1/12/2000
111 1 e.coli 1/12/2000
112 1 e.coli 1/9/2000
113 1 strep 3/10/2004
114 2 s.aureus 1/3/2010
115 2 e.coli 5/3/2010
116 2 pseudomonas 7/3/2010
170 3 pseudomonas 5/10/2005
175 3 e.coli 5/10/2005
176 3 enterobac 5/10/2005
180 3 s.aureus 1/10/2006
190 3 s.aureus 6/12/2006
;
data want;
array _ {99999} _temporary_;
set have;
by id;
if first.id then _[id]=date;
if _[id]=date;
run;
Result:
Unique ID Bacteria date 110 1 s.aureus 01/12/2000 111 1 e.coli 01/12/2000 114 2 s.aureus 01/03/2010 170 3 pseudomonas 05/10/2005 175 3 e.coli 05/10/2005 176 3 enterobac 05/10/2005
Thank you for your respons.
Appologies for not mentioning that this dataset is quite large - what I wrote is just an "sample" of the whole set to illustrate what my data looks like. So what should my code in SAS look like? Sorry, Rookie here 🙂
How large? And what is your maximum value of ID?
Max. value of ID in total is 4039199 - so it's a lot. 🙂
My code can handle that, simply add more entries to the array like this
data want;
array _ {9999999} _temporary_;
set have;
by id;
if first.id then _[id]=date;
if _[id]=date;
run;
I think it's actually simpler than that:
data want;
set have;
by id notsorted;
if first.id then _date = date;
retain _date;
if date = _date;
run;
I tried that but it only keeps the observations from the latest date not the earliest.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.