How to create a new variable based on a series of variable?

c8826024 · Posted 12-01-2011 05:09 PM

I have a dataset like this:

id	Hospital_dt	T1_date	T2_date	T3_date	T4_date	T5_date
1	3/4/2011	1/1/2011	3/2/3011	3/5/2011
2	1/1/2011	1/1/2011	2/1/2011
3	2/15/2011
4	10/1/2010	8/1/2010	9/1/2010

I want to create a new variable which will show the data of the first treatment AFTER hospital date. It will look like this:

id	Hospital_dt	T1_date	T2_date	T3_date	T4_date	T5_date	first_Treatment_dt
1	3/4/2011	1/1/2011	3/2/3011	3/5/2011			3/5/2011
2	1/1/2011	1/1/2011	2/1/2011				1/1/2011
3	2/15/2011
4	10/1/2010	8/1/2010	9/1/2010

Tom · Posted 12-01-2011 05:19 PM

How about using an array?

options ls=80;
data have;
  infile cards dlm='|' dsd truncover;
  input id  (Hospital_dt T1_date T2_date T3_date T4_date T5_date) (:mmddyy10.);
  format _all_ yymmdd10. id ;
cards;
1|3/4/2011|1/1/2011|3/2/3011|3/5/2011||
2|1/1/2011|1/1/2011|2/1/2011|||
3|2/15/2011|||||
4|10/1/2010|8/1/2010|9/1/2010|
run;
data want;
  set have ;
  array tdate t1_date t2_date t3_date t4_date t5_date ;
  do over tdate ;
    if tdate >= hospital_dt then first_treatment_dt = min(first_treatment_dt,tdate);
  end;
  format first_treatment_dt yymmdd10.;
run;

f

i

r

s

t

_

t

H r

o e

s a

p t

i T T T T T m

t 1 2 3 4 5 e

a _ _ _ _ _ n

l d d d d d t

O _ a a a a a _

b i d t t t t t d

s d t e e e e e t

1 1 2011-03-04 2011-01-01 3011-03-02 2011-03-05 . . 2011-03-05

2 2 2011-01-01 2011-01-01 2011-02-01 . . . 2011-01-01

3 3 2011-02-15 . . . . . .

4 4 2010-10-01 2010-08-01 2010-09-01 . . . .

c8826024 · Posted 12-01-2011 05:36 PM

That's quick and neat.

One more question, if I know each location where the patience was treated. But I only need the first location AFTER hospital date:

id	Hospital_dt	T1_date	T2_date	T3_date	T4_date	T5_date	Location1	Location2	Location3	first_Treatment_dt	location
1	3/4/2011	1/1/2011	3/2/3011	3/5/2011			a	b	c	3/5/2011	c
2	1/1/2011	1/1/2011	2/1/2011				b	d		1/1/2011	b
3	2/15/2011
4	10/1/2010	8/1/2010	9/1/2010				a	b

Linlin · Posted 12-01-2011 06:03 PM

you can add an array for locations to Art's code.

data want (drop=i);

set have;

array dates(*) T1_date--T5_date;

array location(*) location1--location5;

format first_Treatment_dt mmddyy10.;

do i=1 to dim(dates);

if dates(i) ge Hospital_dt then do;

first_Treatment_dt=dates(i);

first_location=location(i);

leave;

end;

run;

art297 · Posted 12-01-2011 06:13 PM

Linlin, Agreed! I was about to propose the exact same code, but didn't, as the results didn't come out as expected. Then I realized that it wasn't working as expected because the second date, in the example data, was 3/2/3011. Doubt if I'll be alive in 3011, but I couldn't figure out why 3/2 was being selected.

art297 · Posted 12-01-2011 05:27 PM

Same suggestion as Tom's, but I like to leave loops once a condition is met. Conversely, if your data isn't in date order, then I'd definitely go with Tom's suggested code: