I have a dataset like this:
id | Hospital_dt | T1_date | T2_date | T3_date | T4_date | T5_date |
1 | 3/4/2011 | 1/1/2011 | 3/2/3011 | 3/5/2011 | ||
2 | 1/1/2011 | 1/1/2011 | 2/1/2011 | |||
3 | 2/15/2011 | |||||
4 | 10/1/2010 | 8/1/2010 | 9/1/2010 |
I want to create a new variable which will show the data of the first treatment AFTER hospital date. It will look like this:
id | Hospital_dt | T1_date | T2_date | T3_date | T4_date | T5_date | first_Treatment_dt |
1 | 3/4/2011 | 1/1/2011 | 3/2/3011 | 3/5/2011 | 3/5/2011 | ||
2 | 1/1/2011 | 1/1/2011 | 2/1/2011 | 1/1/2011 | |||
3 | 2/15/2011 | ||||||
4 | 10/1/2010 | 8/1/2010 | 9/1/2010 |
How about using an array?
options ls=80;
data have;
infile cards dlm='|' dsd truncover;
input id (Hospital_dt T1_date T2_date T3_date T4_date T5_date) (:mmddyy10.);
format _all_ yymmdd10. id ;
cards;
1|3/4/2011|1/1/2011|3/2/3011|3/5/2011||
2|1/1/2011|1/1/2011|2/1/2011|||
3|2/15/2011|||||
4|10/1/2010|8/1/2010|9/1/2010|
run;
data want;
set have ;
array tdate t1_date t2_date t3_date t4_date t5_date ;
do over tdate ;
if tdate >= hospital_dt then first_treatment_dt = min(first_treatment_dt,tdate);
end;
format first_treatment_dt yymmdd10.;
run;
f
i
r
s
t
_
t
H r
o e
s a
p t
i T T T T T m
t 1 2 3 4 5 e
a _ _ _ _ _ n
l d d d d d t
O _ a a a a a _
b i d t t t t t d
s d t e e e e e t
1 1 2011-03-04 2011-01-01 3011-03-02 2011-03-05 . . 2011-03-05
2 2 2011-01-01 2011-01-01 2011-02-01 . . . 2011-01-01
3 3 2011-02-15 . . . . . .
4 4 2010-10-01 2010-08-01 2010-09-01 . . . .
That's quick and neat.
One more question, if I know each location where the patience was treated. But I only need the first location AFTER hospital date:
id | Hospital_dt | T1_date | T2_date | T3_date | T4_date | T5_date | Location1 | Location2 | Location3 | first_Treatment_dt | location |
1 | 3/4/2011 | 1/1/2011 | 3/2/3011 | 3/5/2011 | a | b | c | 3/5/2011 | c | ||
2 | 1/1/2011 | 1/1/2011 | 2/1/2011 | b | d | 1/1/2011 | b | ||||
3 | 2/15/2011 | ||||||||||
4 | 10/1/2010 | 8/1/2010 | 9/1/2010 | a | b |
you can add an array for locations to Art's code.
data want (drop=i);
set have;
array dates(*) T1_date--T5_date;
array location(*) location1--location5;
format first_Treatment_dt mmddyy10.;
do i=1 to dim(dates);
if dates(i) ge Hospital_dt then do;
first_Treatment_dt=dates(i);
first_location=location(i);
leave;
end;
end;
run;
Linlin, Agreed! I was about to propose the exact same code, but didn't, as the results didn't come out as expected. Then I realized that it wasn't working as expected because the second date, in the example data, was 3/2/3011. Doubt if I'll be alive in 3011, but I couldn't figure out why 3/2 was being selected.
Same suggestion as Tom's, but I like to leave loops once a condition is met. Conversely, if your data isn't in date order, then I'd definitely go with Tom's suggested code:
data want (drop=i);
set have;
array dates(*) T1_date--T5_date;
format first_Treatment_dt mmddyy10.;
do i=1 to dim(dates);
if dates(i) ge Hospital_dt then do;
first_Treatment_dt=dates(i);
leave;
end;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.