Programming the statistical procedures from SAS

lagged variable controlling for dates

Accepted Solution Solved
Reply
Contributor
Posts: 43
Accepted Solution

lagged variable controlling for dates

dear all,

I have a dataset with several monthly occurrences for each individual.

since I want to regress x(t) on x(t-1) and x(t-12) I have to create these lagged variables for each individual.

the code that I'am using is:

proc sort data=ds;

by code date;

run;

proc panel data=ds;

id code date;

lag x(1 12) / out=ds1;

run;

(I might use proc expand instead of proc panel)

however since there may be missing observations not for every individual the series is uninterrupted.

for instance in the following example  for individual a  the months of january and july 2007 are missing. in this case I would like the lagged value for x is missing for february 2007 and august 2007

code date x x_1

a dec06 1 .

a feb07 2 .

a mar07 3 2

a apr07 4 3

a may07 5 4

a jun07 6 5

a aug07 7 .

a sep07 8 7

a oct07 9 8

a nov07 10 9

a dec07 11 10

a jan08 12 11

the code above instead fills in the value with the previous observation within the same code regardless of the date.

should I merge the data with the list of uninterrpted dates in a way to obtain beforehand the missing for the lacking observation or is there any way to get the results?

by the way I'm working with hundred of thousands of observation and proc panel takes a while to get the results, so I'm looking possibly for a more efficient solution.

I would appreciate any suggestions. thank you very much in advance


Accepted Solutions
Solution
‎10-21-2011 05:13 AM
Respected Advisor
Posts: 3,780

Re: lagged variable controlling for dates

Are you trying to do this?

proc expand data=ds from=month method=none;

   by code;

   id date;

   convert x;

   convert x=x1 / transformout=(lag 1);

   convert x=x12 / transformout=(lag 12);

   run;

Obs    code    date      x    x1    x12

  1     a      DEC06     1     .     .

  2     a      JAN07     .     1     .

  3     a      FEB07     2     .     .

  4     a      MAR07     3     2     .

  5     a      APR07     4     3     .

  6     a      MAY07     5     4     .

  7     a      JUN07     6     5     .

  8     a      JUL07     .     6     .

  9     a      AUG07     7     .     .

10     a      SEP07     8     7     .

11     a      OCT07     9     8     .

12     a      NOV07    10     9     .

13     a      DEC07    11    10     1

14     a      JAN08    12    11     .

View solution in original post


All Replies
Contributor
Posts: 43

lagged variable controlling for dates

through a better search I have found a previous post addressing the same problem: http://communities.sas.com/message/42101#42101

I will look into it. however it seems the issue is resolved through a datastep solution. so maybe there is no lag function/statement that condition on the dates

Solution
‎10-21-2011 05:13 AM
Respected Advisor
Posts: 3,780

Re: lagged variable controlling for dates

Are you trying to do this?

proc expand data=ds from=month method=none;

   by code;

   id date;

   convert x;

   convert x=x1 / transformout=(lag 1);

   convert x=x12 / transformout=(lag 12);

   run;

Obs    code    date      x    x1    x12

  1     a      DEC06     1     .     .

  2     a      JAN07     .     1     .

  3     a      FEB07     2     .     .

  4     a      MAR07     3     2     .

  5     a      APR07     4     3     .

  6     a      MAY07     5     4     .

  7     a      JUN07     6     5     .

  8     a      JUL07     .     6     .

  9     a      AUG07     7     .     .

10     a      SEP07     8     7     .

11     a      OCT07     9     8     .

12     a      NOV07    10     9     .

13     a      DEC07    11    10     1

14     a      JAN08    12    11     .

Super User
Posts: 9,769

lagged variable controlling for dates

How about:

data temp;
input code $ date : monyy7. x ;
format date date9.;
cards;
a dec06 1 
a feb07 2 
a mar07 3 
a apr07 4 
a may07 5 
a jun07 6
a aug07 7 
a sep07 8 
a oct07 9 
a nov07 10 
a dec07 11
a jan08 12
;
run;
data want;
 set temp;
 x_1=lag(x);
 if intck('month',lag(date),date) ne 1 then call missing(x_1);
 if code ne lag(code) then call missing(x_1);
run;

Ksharp

Contributor
Posts: 43

lagged variable controlling for dates

Dear Ksharp, thank you very much.  I found your solution simple and elegant.  I liked a lot and it works well for lag 1.

Unfortunately this does not seem to solve the problem for lag 12 (or any other lag).

Consider the following:

data temp;

input code $ date : monyy7. x ;

format date date9.;

cards;

a nov06 0

a dec06 1

a jan07 2

a feb07 3

a mar07 4

a apr07 5

a may07 6

a jun07 7

a jul07 8

a aug07 9

a sep07 10

a oct07 11

a nov07 12

a dec07 13

a jan08 14

b nov06 20

b dec06 21

b feb07 22

b mar07 23

b apr07 24

b may07 25

b jun07 26

b aug07 27

b sep07 28

b oct07 29

b nov07 30

b dec07 31

b jan08 32

  ;

run;

data want;

set temp;

x_1=lag(x);

if intck('month',lag(date),date) ne 1 then call missing(x_1);

if code ne lag(code) then call missing(x_1);

x_12=lag12(x);

if intck('month',lag12(date),date) ne 12 then call missing(x_12);

if code ne lag12(code) then call missing(x_12);

run;

For code b where there are missing observations I don't get the value of x_12 for december 2007 although there exists an observation for december 2006. I imagine that I have to loop your code from lag12 to lag1 until I find the correct corresponding month, if there is one, but I don't know how to code it to get this result.

any suggestion?

PROC Star
Posts: 7,416

lagged variable controlling for dates

Two options that immediately come to mind are either building your own lag equivalents using retain, or preprocessing your data to fill in the missing dates.  I would go for the latter.

Lags are simply queues and your code doesn't take missing from the queue into account.

How about something like:

data complete (keep=code date x);

  set temp (rename=(

   date=in_date

   x=in_x));

  by code;

  retain last_date;

  if first.code then do;

    date=in_date;

    x=in_x;

    output;

  end;

  else do i=1 to intck('month',last_date,in_date);

    if i=intck('month',last_date,in_date) then do;

      date=in_date;

      x=in_x;

    end;

    else do;

      date=intnx('month',last_date,i,'sameday');

      call missing(x);

    end;

    output;

  end;

  last_date=date;

run;

data want;

  set complete;

  x_1=lag(x);

  if intck('month',lag(date),date) ne 1 then call missing(x_1);

  if code ne lag(code) then call missing(x_1);

  x_12=lag12(x);

  if intck('month',lag12(date),date) ne 12 then call missing(x_12);

  if code ne lag12(code) then call missing(x_12);

run;

Super User
Posts: 9,769

lagged variable controlling for dates

Hi.

What you need is more like a querying.

data temp;
input code $ date : monyy7. x ;
format date date9.;
cards;
a nov06 0
a dec06 1
a jan07 2
a feb07 3
a mar07 4
a apr07 5
a may07 6
a jun07 7
a jul07 8
a aug07 9
a sep07 10
a oct07 11
a nov07 12
a dec07 13
a jan08 14
b nov06 20
b dec06 21
b feb07 22
b mar07 23
b apr07 24
b may07 25
b jun07 26
b aug07 27
b sep07 28
b oct07 29
b nov07 30
b dec07 31
b jan08 32
;
run;

data want;
if _n_ eq 1 then do;
 if 0 then set temp(rename=(date=_date x=x_12));
 declare hash ha(hashexp : 10,dataset : 'temp(rename=(date=_date x=x_12))');
  ha.definekey('code','_date');
  ha.definedata('x_12');
  ha.definedone();
                  end;
set temp;
 _date=intnx('month',date,-12);
 rc=ha.find();
 if rc then call missing(x_12);
 drop _date rc;
run; 









Ksharp

Contributor
Posts: 43

Re: lagged variable controlling for dates

Dear data _null_, art297 and ksharp, thank you very much. all your solution works well. to be honest ksharp's hashing is beyond my current understanding.

For some reason I didn't see few days ago the answer of data _null_ and only now I learn that proc expand can manage my data situation. That maybe is the simplest solution although it might require more time to run on my data.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 189 views
  • 6 likes
  • 4 in conversation