So basically I want to use the prior and prior prior values as my features for my label which is the same value, y. I've changed the data into what I think is easiest to understand. I done something like this in Python, but not sure how to in SAS. Any code on how to change the have dataset into the want dataset would be much appreciated!
data have;
infile datalines delimiter=',';
input id $ y year 8.;
datalines;
1,20,2016
1,10,2015
1,5,2014
2,60,2016
2,30,2015
2,15,2014
3,36,2016
3,18,2015
3,9,2014
;
data want;
infile datalines delimiter=',';
input id $ y year lag_1_y lag_2_y 8.;
datalines;
1,20,2016,10,5
1,10,2015,5,.
1,5,2015,.,.
2,60,2016,30,15
2,30,2015,15,.
2,15,2014,.,.
3,36,2016,18,9
3,18,2015,9,.
3,9,2014,.,.
;
You can use the lag functions, but of course you don't want to contaminate the start of any ID with lagged values of Y from the prior ID. You can do this by examining the corresponding lagged values of ID:
data have;
infile datalines delimiter=',';
input id $ y year 8.;
datalines;
1,20,2016
1,10,2015
1,5,2014
2,60,2016
2,30,2015
2,15,2014
3,36,2016
3,18,2015
3,9,2014
run;
data want;
set have;
lag_1_y=lag(y);
if lag(id)^=id then lag_1_y=.;
lag_2_y=lag2(y);
if lag2(id)^=id then lag_2_y=.;
run;
You can use the lag functions, but of course you don't want to contaminate the start of any ID with lagged values of Y from the prior ID. You can do this by examining the corresponding lagged values of ID:
data have;
infile datalines delimiter=',';
input id $ y year 8.;
datalines;
1,20,2016
1,10,2015
1,5,2014
2,60,2016
2,30,2015
2,15,2014
3,36,2016
3,18,2015
3,9,2014
run;
data want;
set have;
lag_1_y=lag(y);
if lag(id)^=id then lag_1_y=.;
lag_2_y=lag2(y);
if lag2(id)^=id then lag_2_y=.;
run;
I changed it a little for it to work as time series needs to be descending, but it worked after adding this before your code since order doesn't matter to me:
proc sort data=have;
by descending id year;
run;
That is not called LAG, should call NEXT or FORWARD.
data have;
infile datalines delimiter=',';
input id $ y year 8.;
datalines;
1,20,2016
1,10,2015
1,5,2014
2,60,2016
2,30,2015
2,15,2014
3,36,2016
3,18,2015
3,9,2014
run;
data want;
merge have
have(firstobs=2 keep=id y rename=(id=id1 y=lag_1_y))
have(firstobs=3 keep=id y rename=(id=id2 y=lag_2_y));
if id ne id1 then call missing(lag_1_y);
if id ne id2 then call missing(lag_2_y);
drop id1 id2;
run;
This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:
Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment