Solved: How to create a forward variable ? - Page 2

FreelanceReinh · Posted 03-08-2021 06:32 PM

@Phil_NZ wrote:

Hi @FreelanceReinh

Thank you for your suggestion. I learn more about the option end=eof. The code runs flawlessly.

But I still confuse how the data want receive the value of weight.lead1.

You can think of the two SET statements as reading from two separate datasets:

sashelp.class (19 obs.)
a dataset consisting of only the weight values of all but the first observation of sashelp.class (i.e. 18 obs.), with weight_lead1 as the name of that single variable.

In the first iteration of the DATA step -- _N_=1, eof=0, hence ~eof=1 -- the first SET statement reads the first observation of the first dataset and the second SET statement reads the first observation of the second dataset (which contains the weight value of the second observation of sashelp.class in variable weight_lead1). So, at the end of the first iteration the program data vector contains the values of the first observation of sashelp.class and in variable weight_lead1 the desired "lead" value of variable weight.

In the second iteration of the DATA step -- _N_=2, still eof=0, hence ~eof=1 -- the first SET statement reads the second observation of the first dataset and the second SET statement reads the second observation of the second dataset (which contains the weight value of the third observation of sashelp.class in variable weight_lead1). So, at the end of the second iteration the program data vector contains the values of the second observation of sashelp.class and, again, in variable weight_lead1 the desired "lead" value of variable weight.

... and so on until the 18th observation.

In the 19th iteration of the DATA step -- _N_=19, at the beginning still eof=0 -- the first SET statement reads the last (=19th) observation of the first dataset, the value of eof is switched to 1, hence ~eof=0, and therefore the second SET statement is not executed, which is good because reading past the end of the dataset (remember that the second dataset has only 18 observations, the last of which has already been read in the previous iteration of the DATA step) would cause the DATA step to terminate immediately. That is, dataset WANT would have only 18 observations (the observation with name='William' would be missing). Now, with ~eof being false (0), the ELSE branch is executed instead and overwrites the retained value of weight_lead1 (=112, from the last observation of the second dataset -- remember that variables read by a SET statement are automatically retained) with a missing value.

In the last (=20th!) iteration of the DATA step -- _N_=20, eof=1 -- the first SET statement reads past the end of the first dataset and this terminates the DATA step. So, the 19th observation of dataset WANT (written at the end of the 19th iteration) is also the last.

Phil_NZ · Posted 03-08-2021 07:40 PM

Hi @FreelanceReinh

Many thanks for your detailed explanation. I really appreciate it.

So, I just want to cross-check with you:

So the code

if ~eof then set have(firstobs=2 keep=weight rename=(weight=weight_lead1));
else weight_lead1=.;

is the one that imports the data from the second dataset have to the dataset want?

Warmest regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

FreelanceReinh · Posted 03-09-2021 03:03 AM

@Phil_NZ wrote:
if ~eof then set have(firstobs=2 keep=weight rename=(weight=weight_lead1));
else weight_lead1=.;
is the one that imports the data from the second dataset have to the dataset want?

Exactly. And have(firstobs=2 keep=weight rename=(weight=weight_lead1)) is the second dataset.

Reeza · Posted 03-08-2021 12:39 PM

PROC EXPAND via SAS/ETS if you have a license for it.

Otherwise the search term you're looking for is "LEAD" and if you search lexjansen.com for LEAD variables you'll find at least a dozen papers on the topic.
https://www.lexjansen.com/search/searchresults.php?q=lead%20variable

Phil_NZ · Posted 03-08-2021 05:37 PM

Hi @Reeza

It is a breath-taking solution to me!

Well ,I found a document, and after applying to my code, I get things easily.

Normally, I need to use

lagx=ifn(first.id,.,lag(x));

but proc convert can do all of things (set the first.id =. automatically), the same with lead (set the last.id=.).

There are two minor questions here:

1. Can you please tell me what do you mean about "if you have a license for it" , it is "for SAS" or for what, please?

2. When I run the code below

data uuu;
set sashelp.class;
run;
proc sort data=uuu;
by sex;
run;

proc expand data=uuu method=none;
	by Sex;
	convert weight=weight_lead1 / transformout=(lead 1);
	convert weight=weight_lead2 / transformout=(lead 2);
	convert weight=weight_lag1 / transformout=(lag 1);
run;

The result is as below

There is column name TIME coming from nowhere, can you please explain to me what it is for and why it exists in my output like that?

Warmest regard and thanks.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

Reeza · Posted 03-08-2021 06:09 PM

1. Can you please tell me what do you mean about "if you have a license for it" , it is "for SAS" or for what, please?

SAS has different modules, EXPAND is part of the ETS module. You can choose which modules to have installed or licensed. ETS is Econometrics and Time Series

SAS/OR is Operations Research for example.

Different modules include different procs. If you're familiar with R/Python, its similar to different packages or libraries for different types of analysis.

PROC EXPAND is designed for time series data. If you think about it, when using LEAD you usually need some kind of variable that represents an order or time, not just the implication of the row position. What happens if you have missing data?

You're currently using a mock example which is unrealistic in that regard. The leading row in that case has no business or logical context, whereas looking at the future or previous value for sales could be informative. If you have a use case that is similar you can either add a mock time variable to get the same functionality or it generates it.

Tom · Posted 03-08-2021 01:23 PM

MERGE without BY could work, but since that is normally a mistake you need to clearly document your code and adjust the options that flag it as an error.

* Turn off MERGENOBY option for this step ;
%let save=%sysfunc(getoption(mergenoby,keyword));
options mergenoby=nowarn;
data want;
  merge have have(firstobs=2 keep=weight rename=(weight=weight_lead)) ;
run;
* Reset MERGENOBY option ;
options &save;

Or just use two separate SET statements. But then you need control for the different number of observations. One way is to append some extra empty observations to match the number of skipped.

data want;
  set have;
  set have(firstobs=2 keep=weight rename=(weight=weight_lead)) 
      have(obs=1 drop=_all_)
  ;
run;

Now if your data is grouped you need a way to not take the value from the next group. You could use LAST. flag to check for that and then clear the value.

data want;
  set have;
  by id;
  set have(firstobs=2 keep=weight rename=(weight=weight_lead)) 
      have(obs=1 drop=_all_)
  ;
  if last.id then call missing(weight_lead);
run;

Tom · Posted 03-08-2021 01:33 PM

If you want to LEAD2, LEAD3, etc then you will need a different method to detect when the values should be cleared because you have reached the end of the group. In that case keep the ID variable also and test it.

data have;
 input id @;
 do row=1 to 5 ; input x @; output; end;
cards;
1 1 2 3 4 5
2 11 12 13 14 15
;

data want;
   set have;
   by id;
   set have(firstobs=2 keep=id x rename=(id=id1 x=x1)) have(obs=1 drop=_all_);
   set have(firstobs=3 keep=id x rename=(id=id2 x=x2)) have(obs=2 drop=_all_);
   set have(firstobs=4 keep=id x rename=(id=id3 x=x3)) have(obs=3 drop=_all_);
   if id1 ne id then call missing(x1);
   if id2 ne id then call missing(x2);
   if id3 ne id then call missing(x3);
   drop id1-id3;
run;

proc print;
run;

Obs    id    row     x    x1    x2    x3

  1     1     1      1     2     3     4
  2     1     2      2     3     4     5
  3     1     3      3     4     5     .
  4     1     4      4     5     .     .
  5     1     5      5     .     .     .
  6     2     1     11    12    13    14
  7     2     2     12    13    14    15
  8     2     3     13    14    15     .
  9     2     4     14    15     .     .
 10     2     5     15     .     .     .

Phil_NZ · Posted 03-09-2021 05:51 AM

Hi everyone @

It may be the time to settle down on this topic.

I just want to say that I really and truly appreciate your contribution and many alternative solutions that really blow my mind. I used my whole day just to read the comment and learned from the document to understand your suggestion and response, it is just an amazing day to me.

I learned a lot from your suggestions. I hope that the next person coming to this topic can read through all comments on this topic (that's why I comment after @Tom , I hope you do not mind about that)

I will choose the comment of @Reeza as a solution because this comment not only answers my question efficiently but also broaden my horizon about a very powerful PROC EXPAND which is available in my uni.

Many thanks and best regards,

Phil

Massey, New Zealand

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

Re: How to create a forward variable ?

Re: How to create a forward variable ?

Re: How to create a forward variable ?

Re: How to create a forward variable ?

Re: How to create a forward variable ?

Re: How to create a forward variable ?

Re: How to create a forward variable ?

Re: How to create a forward variable ?

Re: How to create a forward variable ?

Registration is open

SAS Training: Just a Click Away