Hi all SAS experts!
My dataset looks something like this:
Unique ID Bacteria date:
110 1 s.aureus 1/12 2000
111 1 e.coli 1/12 2000
112 1 e.coli 1/9 2000
113 1 strep 3/10 2004
114 2 s.aureus 1/3 2010
115 2 e.coli 5/3 2010
116 2 pseudomonas 7/3 2010
170 3 pseudomonas 5/10 2005
175 3 e.coli 5/10 2005
176 3 enterobac 5/10 2005
180 3 s.aureus 1/10 2006
190 3 s.aureus 6/12 2006
How can I make SAS to select the first observation from the same ID of the same date only? With this example I would like to for an example keep for ID=1 the unique variable 110 and 111 as they are from the same date but get rid of the other samples with the same ID=1.
Hope you guys can help!
Kind regards
GK
@GK5 Hi and welcome to the SAS Community 🙂
Here is one way
data have;
input Unique ID Bacteria :$20. date :ddmmyy10.;
format date ddmmyy10.;
datalines;
110 1 s.aureus 1/12/2000
111 1 e.coli 1/12/2000
112 1 e.coli 1/9/2000
113 1 strep 3/10/2004
114 2 s.aureus 1/3/2010
115 2 e.coli 5/3/2010
116 2 pseudomonas 7/3/2010
170 3 pseudomonas 5/10/2005
175 3 e.coli 5/10/2005
176 3 enterobac 5/10/2005
180 3 s.aureus 1/10/2006
190 3 s.aureus 6/12/2006
;
data want;
array _ {99999} _temporary_;
set have;
by id;
if first.id then _[id]=date;
if _[id]=date;
run;
Result:
Unique ID Bacteria date 110 1 s.aureus 01/12/2000 111 1 e.coli 01/12/2000 114 2 s.aureus 01/03/2010 170 3 pseudomonas 05/10/2005 175 3 e.coli 05/10/2005 176 3 enterobac 05/10/2005
Thank you for your respons.
Appologies for not mentioning that this dataset is quite large - what I wrote is just an "sample" of the whole set to illustrate what my data looks like. So what should my code in SAS look like? Sorry, Rookie here 🙂
How large? And what is your maximum value of ID?
Max. value of ID in total is 4039199 - so it's a lot. 🙂
My code can handle that, simply add more entries to the array like this
data want;
array _ {9999999} _temporary_;
set have;
by id;
if first.id then _[id]=date;
if _[id]=date;
run;
I think it's actually simpler than that:
data want;
set have;
by id notsorted;
if first.id then _date = date;
retain _date;
if date = _date;
run;
I tried that but it only keeps the observations from the latest date not the earliest.
Nearly 200 sessions are now available on demand in the Innovate Hub.
Watch Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.