Matching later values of a variable to values as of a given date

Walternate · Posted 01-19-2016 03:00 PM

Hi,

I have a dataset at the person-month level (however, each person can have more than one row in one month). It has an ID variable, three date variables, and two categorical variables. If a person has more than one row per month, the values of Date2, Categ 1, and Categ2 will be different in each row. Date1 shows which month/year the variables are "as of".

ID Date1 Date2 Date3 Categ1 Categ2

1 201001 1/1/2009 . abc a

1 201001 1/25/2010 . def b

1 201002 1/1/2009 . abc a

1 201002 1/25/2010 2/3/2010 def b

1 201003 1/1/2009 . abc a

1 201003 3/12/2010 3/12/2010 ghi b

What I want is to do the following:

1. Create an indicator variable for everyone that has Categ1=def, and a variable that says the earliest Date2 for which they have that value.

2. Take the value of Categ2 at the earliest Date2 where Categ1=def (in this case, b). Find the minimum value of Date3 within the rows where Categ2=b.

3. Additionally, also within the rows where Categ2=b, see whether there are any values of ghi in Categ1; if so, create an indicator variable showing that the person had that value and a date variable for when they had it (using Date2).

I can do most of this by myself; the issue I'm having is systematizing only looking in the rows with the Categ2 value that corresponds with the earliest instance of Categ1=def.

Any help is much appreciated.

rivieralad · Posted 01-20-2016 08:37 AM

Hi there.

Assuming I have understood your problem correctly you will probably want something along these lines (I added some extra ID records to test it).

Will need tidying up but this gives you what you asked for in the way of indicators and dates.

Cheers

Chris

data have;
infile datalines;
input
@01 id $1.
@08 dt1 yymmn6.
@17 dt2 mmddyy10.
@33 dt3 mmddyy10.
@43 cat1 $3.
@54 cat2 $1.
;
datalines;
1 201001 1/1/2009 . abc a
1 201001 1/25/2010 . def b
1 201002 1/1/2009 . abc a
1 201002 1/25/2010 2/3/2010 def b
1 201003 1/1/2009 . abc a
1 201003 3/12/2010 3/12/2010 ghi b
2 201001 1/1/2009 . abc a
2 201001 1/25/2010 . def b
2 201002 1/1/2009 . abc a
2 201002 1/25/2010 2/3/2010 def b
2 201003 1/1/2009 . abc a
3 201001 1/1/2009 . abc a
3 201001 1/25/2010 . def c
3 201002 1/1/2009 . abc a
3 201002 1/25/2010 2/3/2010 def b
3 201003 1/1/2009 . abc a
3 201003 3/12/2010 3/12/2010 ghi b
4 201001 1/1/2009 . abc a
4 201001 1/25/2010 . def b
4 201002 1/1/2009 . abc a
4 201002 1/25/2010 2/3/2010 def b
4 201003 1/1/2009 . abc a
4 201003 3/12/2010 3/12/2010 ghi b
;
run;

* sort into each date column order within id (probably don't need to sort by the cat columns);

proc sort data = have;
by id dt1 dt2 dt3 cat1 cat2;
run;

* set flags to meet requirements;

data want;
set have;
by id dt1 dt2 dt3 cat1 cat2;

* retain the indicators and earliest dates;

retain
def_ind
dt2_first
cat2_val
dt3_first
ghi_ind
dt2_ghi
;

* initialise indicators and earliest dates;;
if first.id then do;
def_ind = 0;
dt2_first = .;
cat2_val = '';
dt3_first = .;
ghi_ind = 0;
dt2_ghi = .;
end;

* set def and earliest dt2 flag if cat1 set to def and indicator not already set ;
if cat1 eq 'def' then do;
if def_ind eq 0 then do;
def_ind = 1;
dt2_first = dt2;
cat2_val = cat2;
end;
end;

* if cat2 same as when cat1 set to def and dt3 set and dt3_first flag not set then set earliest dt3 flag;
if cat2_val eq cat2 and dt3_first eq . and dt3 ne . then do;
dt3_first = dt3;
end;

* if cat2 same as when cat1 set to def and cat1 set to ghi set ghi indicator and dt2 flag for ghi;;
if cat2_val eq cat2 and ghi_ind eq 0 and cat1 eq 'ghi' then do;
ghi_ind = 1;
dt2_ghi = dt2;
end;

run;

Output looks like (sorry, needs aligning):

Obs id dt1 dt2 dt3 cat1 cat2 def_ind dt2_first cat2_val dt3_first ghi_ind dt2_ghi

1 1 18263 17898 . abc a 0 . . 0 .
2 1 18263 18287 . def b 1 18287 b . 0 .
3 1 18294 17898 . abc a 1 18287 b . 0 .
4 1 18294 18287 18296 def b 1 18287 b 18296 0 .
5 1 18322 17898 . abc a 1 18287 b 18296 0 .
6 1 18322 18333 18333 ghi b 1 18287 b 18296 1 18333
7 2 18263 17898 . abc a 0 . . 0 .
8 2 18263 18287 . def b 1 18287 b . 0 .
9 2 18294 17898 . abc a 1 18287 b . 0 .
10 2 18294 18287 18296 def b 1 18287 b 18296 0 .
11 2 18322 17898 . abc a 1 18287 b 18296 0 .
12 3 18263 17898 . abc a 0 . . 0 .
13 3 18263 18287 . def c 1 18287 c . 0 .
14 3 18294 17898 . abc a 1 18287 c . 0 .
15 3 18294 18287 18296 def b 1 18287 c . 0 .
16 3 18322 17898 . abc a 1 18287 c . 0 .
17 3 18322 18333 18333 ghi b 1 18287 c . 0 .
18 4 18263 17898 . abc a 0 . . 0 .
19 4 18263 18287 . def b 1 18287 b . 0 .
20 4 18294 17898 . abc a 1 18287 b . 0 .
21 4 18294 18287 18296 def b 1 18287 b 18296 0 .
22 4 18322 17898 . abc a 1 18287 b 18296 0 .
23 4 18322 18333 18333 ghi b 1 18287 b 18296 1 18333

Matching later values of a variable to values as of a given date

Re: Matching later values of a variable to values as of a given date

Click image to register for webinar

Classroom Training Available!