how to generate variable for observation N based on information from observations > N?

Frequent Contributor
Posts: 89

how to generate variable for observation N based on information from observations > N?

Hi, all suppose I have folowing data set:

var
1
3
4
8
3
7

I want to generate a new variable NewVar for each observation such that NewVar(of observation n) = Var( of observation n) - Var(of observation n-1). The new dataset should look like:

varNewVar
12
31
44
8-5
34
7

how can I do this?

Respected Advisor
Posts: 3,167

how to generate variable for observation N based on information from observations > N?

Posted in reply to littlestone

One approach could be:

data have;

infile cards;

input var;

cards;

1

3

4

8

3

7

;

data want (drop=_;

merge have have(firstobs=2 rename=var=_var);

newvar=_var-var;

run;

proc print;run;

Have a Nice Weekend!

Haikuo

Frequent Contributor
Posts: 89

how to generate variable for observation N based on information from observations > N?

Thank you, Hai.kuo

I know SAS has functions such as RETAIN, LAG that can return value from previous records; is there any similar fucntions that can return value from forward records?

Frequent Contributor
Posts: 129

how to generate variable for observation N based on information from observations > N?

Posted in reply to littlestone

I do not know of a look-ahead function.

If you sort your data in reverse order,  you can use the lag function to obtain this functionality.  But then you will need to re-sort after doing look-aheads.  I think that may be a more general solution because it would let you use group by processing in situations more complicated than your example.

Contributor
Posts: 33

Re: how to generate variable for observation N based on information from observations > N?

Posted in reply to littlestone

Not a "look ahead" function, but using two set statements where you start reading on the second observation in the second set will do:

data look_ahead;

set my_data;

set my_data(firstobs=2 rename=(my_var = next_var));

dif_ahead = next_var - my_var;

run;

Be careful with the last observation: the second set statement hits end-of-file before the first observation. If you need that last observation of the first set, you have to take some measures.

By the way: in your original post you wrote the formula: NewVar(of observation n) = Var( of observation n) - Var(of observation n-1).

I assume that you mean: NewVar(of observation n) = Var( of observation n+1) - Var(of observation n)

PROC Star
Posts: 8,165

Re: how to generate variable for observation N based on information from observations > N?

Eric, FYI, if you use the two set statement approach, you need to nest the file as a dummy within the second set statement.  e.g.:

```set have ( firstobs = 2 keep = Measure rename = (Measure = Next_Measure) )
have (      obs = 1 drop = _all_                                     );```

That way you won't get the error when you read the last observation.

Contributor
Posts: 33

Re: how to generate variable for observation N based on information from observations > N?

Art,

That is an elegant way to do it. Another way to do it, is by adding a NOBS option:

data test;

do n=1 to 10;

output;

end;

run;

data lookahead;

set test;

if _n_ le nobs then set test(firstobs=2 rename=( n=m) nobs=nobs);

run;

proc print;run;

Super User
Posts: 8,120

Re: how to generate variable for observation N based on information from observations > N?

The data step will stop looping when the first SET statement reads the EOF so it does not matter if you add 1 or N extra records to the second SET statement by using no variable dataset. This eliminates the need for the OBS=1 dataset option.  This really could make a difference if you want to look ahead more than 1 observation.

Respected Advisor
Posts: 3,167

Re: how to generate variable for observation N based on information from observations > N?

Posted in reply to littlestone

I know this is getting silly (I mean my solution), if you want to use LAG() without sorting, here you go:

data want (drop=_;

set have(rename=var=_var) end=last;

newvar=_var-lag(_var);

var=lag(_var);

if _n_>1 then output;

if last then

do;

var=_var;

call missing(newvar);

output;

end;

run;

Kindly Regards,

Haikuo

Respected Advisor
Posts: 3,167

Re: how to generate variable for observation N based on information from observations > N?

Posted in reply to littlestone

Or if you have ETS and don't care about sign, here is another way:

proc expand data=have out=want;

convert var = newvar   / transformout=(reverse MOVRANGE 2 reverse);

run;

Or you need one more step to have your sign,

proc expand data=have out=want1;

convert var = _lead   / transformout=(lead 1);

run;

data  want (drop=time _lead);

set want1;

newvar=_lead-var;

run;

As you have noted, proc expand does have sort of LEAD function.

Kindly Regards,

Haikuo

Frequent Contributor
Posts: 89

how to generate variable for observation N based on information from observations > N?

Posted in reply to littlestone

Hi, All

I apologize for late response. And Thank you all very much for great help.

Discussion stats
• 10 replies
• 330 views
• 0 likes
• 6 in conversation