I have a dataset like this:
id | Hospital_dt | T1_date | T2_date | T3_date | T4_date | T5_date |
1 | 3/4/2011 | 1/1/2011 | 3/2/3011 | 3/5/2011 | ||
2 | 1/1/2011 | 1/1/2011 | 2/1/2011 | |||
3 | 2/15/2011 | |||||
4 | 10/1/2010 | 8/1/2010 | 9/1/2010 |
I want to create a new variable which will show the data of the first treatment AFTER hospital date. It will look like this:
id | Hospital_dt | T1_date | T2_date | T3_date | T4_date | T5_date | first_Treatment_dt |
1 | 3/4/2011 | 1/1/2011 | 3/2/3011 | 3/5/2011 | 3/5/2011 | ||
2 | 1/1/2011 | 1/1/2011 | 2/1/2011 | 1/1/2011 | |||
3 | 2/15/2011 | ||||||
4 | 10/1/2010 | 8/1/2010 | 9/1/2010 |
How about using an array?
options ls=80;
data have;
infile cards dlm='|' dsd truncover;
input id (Hospital_dt T1_date T2_date T3_date T4_date T5_date) (:mmddyy10.);
format _all_ yymmdd10. id ;
cards;
1|3/4/2011|1/1/2011|3/2/3011|3/5/2011||
2|1/1/2011|1/1/2011|2/1/2011|||
3|2/15/2011|||||
4|10/1/2010|8/1/2010|9/1/2010|
run;
data want;
set have ;
array tdate t1_date t2_date t3_date t4_date t5_date ;
do over tdate ;
if tdate >= hospital_dt then first_treatment_dt = min(first_treatment_dt,tdate);
end;
format first_treatment_dt yymmdd10.;
run;
f
i
r
s
t
_
t
H r
o e
s a
p t
i T T T T T m
t 1 2 3 4 5 e
a _ _ _ _ _ n
l d d d d d t
O _ a a a a a _
b i d t t t t t d
s d t e e e e e t
1 1 2011-03-04 2011-01-01 3011-03-02 2011-03-05 . . 2011-03-05
2 2 2011-01-01 2011-01-01 2011-02-01 . . . 2011-01-01
3 3 2011-02-15 . . . . . .
4 4 2010-10-01 2010-08-01 2010-09-01 . . . .
That's quick and neat.
One more question, if I know each location where the patience was treated. But I only need the first location AFTER hospital date:
id | Hospital_dt | T1_date | T2_date | T3_date | T4_date | T5_date | Location1 | Location2 | Location3 | first_Treatment_dt | location |
1 | 3/4/2011 | 1/1/2011 | 3/2/3011 | 3/5/2011 | a | b | c | 3/5/2011 | c | ||
2 | 1/1/2011 | 1/1/2011 | 2/1/2011 | b | d | 1/1/2011 | b | ||||
3 | 2/15/2011 | ||||||||||
4 | 10/1/2010 | 8/1/2010 | 9/1/2010 | a | b |
you can add an array for locations to Art's code.
data want (drop=i);
set have;
array dates(*) T1_date--T5_date;
array location(*) location1--location5;
format first_Treatment_dt mmddyy10.;
do i=1 to dim(dates);
if dates(i) ge Hospital_dt then do;
first_Treatment_dt=dates(i);
first_location=location(i);
leave;
end;
end;
run;
Linlin, Agreed! I was about to propose the exact same code, but didn't, as the results didn't come out as expected. Then I realized that it wasn't working as expected because the second date, in the example data, was 3/2/3011. Doubt if I'll be alive in 3011, but I couldn't figure out why 3/2 was being selected.
Same suggestion as Tom's, but I like to leave loops once a condition is met. Conversely, if your data isn't in date order, then I'd definitely go with Tom's suggested code:
data want (drop=i);
set have;
array dates(*) T1_date--T5_date;
format first_Treatment_dt mmddyy10.;
do i=1 to dim(dates);
if dates(i) ge Hospital_dt then do;
first_Treatment_dt=dates(i);
leave;
end;
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.