Solved: Re: Not Sure how To Format into Longitudinal Dataset for ML

krueg314 · Posted 02-16-2019 11:13 AM

So basically I want to use the prior and prior prior values as my features for my label which is the same value, y. I've changed the data into what I think is easiest to understand. I done something like this in Python, but not sure how to in SAS. Any code on how to change the have dataset into the want dataset would be much appreciated!

data have;
infile datalines delimiter=',';
input id $ y year 8.;
datalines;
1,20,2016
1,10,2015
1,5,2014
2,60,2016
2,30,2015
2,15,2014
3,36,2016
3,18,2015
3,9,2014
;

data want;
infile datalines delimiter=',';
input id $ y year lag_1_y lag_2_y 8.;
datalines;
1,20,2016,10,5
1,10,2015,5,.
1,5,2015,.,.
2,60,2016,30,15
2,30,2015,15,.
2,15,2014,.,.
3,36,2016,18,9
3,18,2015,9,.
3,9,2014,.,.
;

mkeintz · Posted 02-16-2019 01:13 PM

You can use the lag functions, but of course you don't want to contaminate the start of any ID with lagged values of Y from the prior ID. You can do this by examining the corresponding lagged values of ID:

data have;
infile datalines delimiter=',';
input id $ y year 8.;
datalines;
1,20,2016
1,10,2015
1,5,2014
2,60,2016
2,30,2015
2,15,2014
3,36,2016
3,18,2015
3,9,2014
run;
data want;
  set have;
  lag_1_y=lag(y);
  if lag(id)^=id then lag_1_y=.;
  lag_2_y=lag2(y);
  if lag2(id)^=id then lag_2_y=.;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 02-16-2019 01:13 PM

You can use the lag functions, but of course you don't want to contaminate the start of any ID with lagged values of Y from the prior ID. You can do this by examining the corresponding lagged values of ID:

data have;
infile datalines delimiter=',';
input id $ y year 8.;
datalines;
1,20,2016
1,10,2015
1,5,2014
2,60,2016
2,30,2015
2,15,2014
3,36,2016
3,18,2015
3,9,2014
run;
data want;
  set have;
  lag_1_y=lag(y);
  if lag(id)^=id then lag_1_y=.;
  lag_2_y=lag2(y);
  if lag2(id)^=id then lag_2_y=.;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

krueg314 · Posted 02-19-2019 02:59 PM

I changed it a little for it to work as time series needs to be descending, but it worked after adding this before your code since order doesn't matter to me:

proc sort data=have;
     by descending id year;
run;

Reeza · Posted 02-16-2019 01:14 PM

LAG() and LAG2() functions.
Remember to reset the lag values at the new ID.

Ksharp · Posted 02-18-2019 06:55 AM

That is not called LAG, should call NEXT or FORWARD.

data have;
infile datalines delimiter=',';
input id $ y year 8.;
datalines;
1,20,2016
1,10,2015
1,5,2014
2,60,2016
2,30,2015
2,15,2014
3,36,2016
3,18,2015
3,9,2014
run;
data want;
  merge have
  have(firstobs=2 keep=id y rename=(id=id1 y=lag_1_y))
  have(firstobs=3 keep=id y rename=(id=id2 y=lag_2_y));
  if id ne id1 then call missing(lag_1_y);
  if id ne id2 then call missing(lag_2_y);
  drop id1 id2;
run;

Not Sure how To Format into Longitudinal Dataset for ML

Re: Not Sure how To Format into Longitudinal Dataset for ML

Re: Not Sure how To Format into Longitudinal Dataset for ML

Re: Not Sure how To Format into Longitudinal Dataset for ML

Re: Not Sure how To Format into Longitudinal Dataset for ML

Re: Not Sure how To Format into Longitudinal Dataset for ML

Click image to register for webinar

Classroom Training Available!