BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ciro
Quartz | Level 8

dear all,

I have a dataset with several monthly occurrences for each individual.

since I want to regress x(t) on x(t-1) and x(t-12) I have to create these lagged variables for each individual.

the code that I'am using is:

proc sort data=ds;

by code date;

run;

proc panel data=ds;

id code date;

lag x(1 12) / out=ds1;

run;

(I might use proc expand instead of proc panel)

however since there may be missing observations not for every individual the series is uninterrupted.

for instance in the following example  for individual a  the months of january and july 2007 are missing. in this case I would like the lagged value for x is missing for february 2007 and august 2007

code date x x_1

a dec06 1 .

a feb07 2 .

a mar07 3 2

a apr07 4 3

a may07 5 4

a jun07 6 5

a aug07 7 .

a sep07 8 7

a oct07 9 8

a nov07 10 9

a dec07 11 10

a jan08 12 11

the code above instead fills in the value with the previous observation within the same code regardless of the date.

should I merge the data with the list of uninterrpted dates in a way to obtain beforehand the missing for the lacking observation or is there any way to get the results?

by the way I'm working with hundred of thousands of observation and proc panel takes a while to get the results, so I'm looking possibly for a more efficient solution.

I would appreciate any suggestions. thank you very much in advance

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

Are you trying to do this?

proc expand data=ds from=month method=none;

   by code;

   id date;

   convert x;

   convert x=x1 / transformout=(lag 1);

   convert x=x12 / transformout=(lag 12);

   run;

Obs    code    date      x    x1    x12

  1     a      DEC06     1     .     .

  2     a      JAN07     .     1     .

  3     a      FEB07     2     .     .

  4     a      MAR07     3     2     .

  5     a      APR07     4     3     .

  6     a      MAY07     5     4     .

  7     a      JUN07     6     5     .

  8     a      JUL07     .     6     .

  9     a      AUG07     7     .     .

10     a      SEP07     8     7     .

11     a      OCT07     9     8     .

12     a      NOV07    10     9     .

13     a      DEC07    11    10     1

14     a      JAN08    12    11     .

View solution in original post

7 REPLIES 7
ciro
Quartz | Level 8

through a better search I have found a previous post addressing the same problem: http://communities.sas.com/message/42101#42101

I will look into it. however it seems the issue is resolved through a datastep solution. so maybe there is no lag function/statement that condition on the dates

data_null__
Jade | Level 19

Are you trying to do this?

proc expand data=ds from=month method=none;

   by code;

   id date;

   convert x;

   convert x=x1 / transformout=(lag 1);

   convert x=x12 / transformout=(lag 12);

   run;

Obs    code    date      x    x1    x12

  1     a      DEC06     1     .     .

  2     a      JAN07     .     1     .

  3     a      FEB07     2     .     .

  4     a      MAR07     3     2     .

  5     a      APR07     4     3     .

  6     a      MAY07     5     4     .

  7     a      JUN07     6     5     .

  8     a      JUL07     .     6     .

  9     a      AUG07     7     .     .

10     a      SEP07     8     7     .

11     a      OCT07     9     8     .

12     a      NOV07    10     9     .

13     a      DEC07    11    10     1

14     a      JAN08    12    11     .

Ksharp
Super User

How about:

data temp;
input code $ date : monyy7. x ;
format date date9.;
cards;
a dec06 1 
a feb07 2 
a mar07 3 
a apr07 4 
a may07 5 
a jun07 6
a aug07 7 
a sep07 8 
a oct07 9 
a nov07 10 
a dec07 11
a jan08 12
;
run;
data want;
 set temp;
 x_1=lag(x);
 if intck('month',lag(date),date) ne 1 then call missing(x_1);
 if code ne lag(code) then call missing(x_1);
run;

Ksharp

ciro
Quartz | Level 8

Dear Ksharp, thank you very much.  I found your solution simple and elegant.  I liked a lot and it works well for lag 1.

Unfortunately this does not seem to solve the problem for lag 12 (or any other lag).

Consider the following:

data temp;

input code $ date : monyy7. x ;

format date date9.;

cards;

a nov06 0

a dec06 1

a jan07 2

a feb07 3

a mar07 4

a apr07 5

a may07 6

a jun07 7

a jul07 8

a aug07 9

a sep07 10

a oct07 11

a nov07 12

a dec07 13

a jan08 14

b nov06 20

b dec06 21

b feb07 22

b mar07 23

b apr07 24

b may07 25

b jun07 26

b aug07 27

b sep07 28

b oct07 29

b nov07 30

b dec07 31

b jan08 32

  ;

run;

data want;

set temp;

x_1=lag(x);

if intck('month',lag(date),date) ne 1 then call missing(x_1);

if code ne lag(code) then call missing(x_1);

x_12=lag12(x);

if intck('month',lag12(date),date) ne 12 then call missing(x_12);

if code ne lag12(code) then call missing(x_12);

run;

For code b where there are missing observations I don't get the value of x_12 for december 2007 although there exists an observation for december 2006. I imagine that I have to loop your code from lag12 to lag1 until I find the correct corresponding month, if there is one, but I don't know how to code it to get this result.

any suggestion?

art297
Opal | Level 21

Two options that immediately come to mind are either building your own lag equivalents using retain, or preprocessing your data to fill in the missing dates.  I would go for the latter.

Lags are simply queues and your code doesn't take missing from the queue into account.

How about something like:

data complete (keep=code date x);

  set temp (rename=(

   date=in_date

   x=in_x));

  by code;

  retain last_date;

  if first.code then do;

    date=in_date;

    x=in_x;

    output;

  end;

  else do i=1 to intck('month',last_date,in_date);

    if i=intck('month',last_date,in_date) then do;

      date=in_date;

      x=in_x;

    end;

    else do;

      date=intnx('month',last_date,i,'sameday');

      call missing(x);

    end;

    output;

  end;

  last_date=date;

run;

data want;

  set complete;

  x_1=lag(x);

  if intck('month',lag(date),date) ne 1 then call missing(x_1);

  if code ne lag(code) then call missing(x_1);

  x_12=lag12(x);

  if intck('month',lag12(date),date) ne 12 then call missing(x_12);

  if code ne lag12(code) then call missing(x_12);

run;

Ksharp
Super User

Hi.

What you need is more like a querying.

data temp;
input code $ date : monyy7. x ;
format date date9.;
cards;
a nov06 0
a dec06 1
a jan07 2
a feb07 3
a mar07 4
a apr07 5
a may07 6
a jun07 7
a jul07 8
a aug07 9
a sep07 10
a oct07 11
a nov07 12
a dec07 13
a jan08 14
b nov06 20
b dec06 21
b feb07 22
b mar07 23
b apr07 24
b may07 25
b jun07 26
b aug07 27
b sep07 28
b oct07 29
b nov07 30
b dec07 31
b jan08 32
;
run;

data want;
if _n_ eq 1 then do;
 if 0 then set temp(rename=(date=_date x=x_12));
 declare hash ha(hashexp : 10,dataset : 'temp(rename=(date=_date x=x_12))');
  ha.definekey('code','_date');
  ha.definedata('x_12');
  ha.definedone();
                  end;
set temp;
 _date=intnx('month',date,-12);
 rc=ha.find();
 if rc then call missing(x_12);
 drop _date rc;
run; 









Ksharp

ciro
Quartz | Level 8

Dear data _null_, art297 and ksharp, thank you very much. all your solution works well. to be honest ksharp's hashing is beyond my current understanding.

For some reason I didn't see few days ago the answer of data _null_ and only now I learn that proc expand can manage my data situation. That maybe is the simplest solution although it might require more time to run on my data.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1975 views
  • 6 likes
  • 4 in conversation