BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
krueg314
Calcite | Level 5

So basically I want to use the prior and prior prior values as my features for my label which is the same value, y. I've changed the data into what I think is easiest to understand. I done something like this in Python, but not sure how to in SAS. Any code on how to change the have dataset into the want dataset would be much appreciated!

 

data have;
infile datalines delimiter=',';
input id $ y year 8.;
datalines;
1,20,2016
1,10,2015
1,5,2014
2,60,2016
2,30,2015
2,15,2014
3,36,2016
3,18,2015
3,9,2014
;

 

data want;
infile datalines delimiter=',';
input id $ y year lag_1_y lag_2_y 8.;
datalines;
1,20,2016,10,5
1,10,2015,5,.
1,5,2015,.,.
2,60,2016,30,15
2,30,2015,15,.
2,15,2014,.,.
3,36,2016,18,9
3,18,2015,9,.
3,9,2014,.,.
;

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You can use the lag functions, but of course you don't want to contaminate the start of any ID with lagged values of Y from the prior ID.  You can do this by examining the corresponding lagged values of ID:

 

data have;
infile datalines delimiter=',';
input id $ y year 8.;
datalines;
1,20,2016
1,10,2015
1,5,2014
2,60,2016
2,30,2015
2,15,2014
3,36,2016
3,18,2015
3,9,2014
run;
data want;
  set have;
  lag_1_y=lag(y);
  if lag(id)^=id then lag_1_y=.;
  lag_2_y=lag2(y);
  if lag2(id)^=id then lag_2_y=.;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

You can use the lag functions, but of course you don't want to contaminate the start of any ID with lagged values of Y from the prior ID.  You can do this by examining the corresponding lagged values of ID:

 

data have;
infile datalines delimiter=',';
input id $ y year 8.;
datalines;
1,20,2016
1,10,2015
1,5,2014
2,60,2016
2,30,2015
2,15,2014
3,36,2016
3,18,2015
3,9,2014
run;
data want;
  set have;
  lag_1_y=lag(y);
  if lag(id)^=id then lag_1_y=.;
  lag_2_y=lag2(y);
  if lag2(id)^=id then lag_2_y=.;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
krueg314
Calcite | Level 5

I changed it a little for it to work as time series needs to be descending, but it worked after adding this before your code since order doesn't matter to me:

 

proc sort data=have;
by descending id year;
run;
Reeza
Super User
LAG() and LAG2() functions.
Remember to reset the lag values at the new ID.
Ksharp
Super User

That is not called LAG, should call NEXT or FORWARD.

 

data have;
infile datalines delimiter=',';
input id $ y year 8.;
datalines;
1,20,2016
1,10,2015
1,5,2014
2,60,2016
2,30,2015
2,15,2014
3,36,2016
3,18,2015
3,9,2014
run;
data want;
  merge have
  have(firstobs=2 keep=id y rename=(id=id1 y=lag_1_y))
  have(firstobs=3 keep=id y rename=(id=id2 y=lag_2_y));
  if id ne id1 then call missing(lag_1_y);
  if id ne id2 then call missing(lag_2_y);
  drop id1 id2;
run;