BookmarkSubscribeRSS Feed
TomiKong
Fluorite | Level 6

Is there one function which can return a value stored in the next observation on the opposite of  LAG function ?For example,

data have;

input x;

datalines;

1

2

3

4

5

6

7

8

9

10

;

run;

I want to assign the average value of the next 5 observations to one new variable. How to achieve this? Thanks in advance.

21 REPLIES 21
SteveDenham
Jade | Level 19

How big is your dataset?  If it is not horribly long, try PROC SORT with a descending option and an out=newdata, and then apply the mean function to the lagged variables in newdata.  I know there was something about "look ahead" functions in a thread recently, but I have made this method work before.

Steve Denham

art297
Opal | Level 21

What do you want to assign to records 6 thru 10?

The look forward method that Steve mentioned may have been the one that uses multiple set statements.  e.g.:

data want;

  set have;

  set have ( firstobs = 2 keep = x rename = (x = _x1) )

      have (      obs = 1 drop = _all_               );

  set have ( firstobs = 3 keep = x rename = (x = _x2) )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               );

  set have ( firstobs = 4 keep = x rename = (x = _x3) )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               );

  set have ( firstobs = 5 keep = x rename = (x = _x4) )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               );

  set have ( firstobs = 6 keep = x rename = (x = _x5) )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               );

  mean_x=mean(of _x:);

run;

I think that would be the most efficient method.  Conversely, if you have proc expand, you could use it to expand the dataset, and then use another datastep to calculate the mean.  e.g.:

proc expand data=have out=want2 (drop=time); 

   convert x=x1  / method=none transformout=(lead 1); 

   convert x=x2  / method=none transformout=(lead 2); 

   convert x=x3  / method=none transformout=(lead 3); 

   convert x=x4  / method=none transformout=(lead 4); 

   convert x=x5  / method=none transformout=(lead 5); 

run;

Astounding
PROC Star

OK, here's my shot at it.  I assumed that when we get to observation 6, and there are only 4 foward-looking observations, that we should take the mean of those 4:

 

data test;

do i=1 to 10;

output;

end;

run;

data test2;

set test nobs=_nobs_;

array forward {0:4} forward1-forward5;

retain forward1-forward5;

if _n_=1 then do _j_=2 to 5;

set test (rename=(i=forward_value)) point=_j_;

forward{mod(_j_,5)}=forward_value;

end;

if _n_+5 <= _nobs_ then do;

_j_=_n_+5;

set test (rename=(i=forward_value)) point=_j_;

forward{mod(_n_,5)}=forward_value;

end;

else forward{mod(_n_,5)}=.;

forward_5_mean = mean(of forward{*});

put _all_;

run;

Linlin
Lapis Lazuli | Level 10

data have;
do var=1 to 10;
output;
end;
run;
data temp;
array v(0:4) _temporary_;
set have;
v(mod(_n_,5))=var;
if _n_>=6 then do; mean_5=mean( of v(*)); output;end;
keep mean_5;
run;
data want;
  merge have temp;
proc print;run;
                                       obs    var    mean_5

                                         1      1       4
                                         2      2       5
                                         3      3       6
                                         4      4       7
                                         5      5       8
                                         6      6       .
                                         7      7       .
                                         8      8       .
                                         9      9       .
                                        10     10       .

Haikuo
Onyx | Level 15

Shamelessly stealing Linlin' idea, but modified to finish in one data step:

data have;

do x=1 to 10;

output;  

end;

run;

data want (drop=x rename=y=x) ;

array nx5(0:5) _temporary_;

set have end=last nobs=nobs;

nx5(mod(_n_,6))=x;

if _n_>5 then do;

   y=nx5(mod(_n_-5,6));

   avg=(sum(of nx5(*))-y)/(n(of nx5(*))-1);

   output;

end;

if last then do ;

       do _n_=nobs+1 to nobs+5;

          y=nx5(mod(_n_-5,6));

       call missing(nx5(mod(_n_-5,6)),nx5(mod(_n_-6,6)));

       avg=mean(of nx5(*));

       output;

       end;

end;

run;

proc print;run;

Haikuo

MikeZdeb
Rhodochrosite | Level 12

hi ... another idea

though it's grossly inefficient to keep reading five new values from the data set over-and-over again,

it took < 1.5 seconds to process a million observations (on my "not so fast" PC)

data want (keep=x mean);

set have nobs=obs;

if (obs - _n_) ge 5 then do;

   do nrec=_n_ + 1 to _n_ + 5;

      set have (rename=(x=y)) point=nrec;

      five = sum(five, y);

   end;

   mean = five / 5;

end;

else call missing (mean);

run;

art297
Opal | Level 21

Mike,

Since this is a more automated version of the code that I originally proposed (okay .. I stole it from Howard's sasCommunity.or Look Ahead Look Back article), I think that you really want to include a keep option in the additional set statements.

MikeZdeb
Rhodochrosite | Level 12

hi Art ... I was assuming the one variable data set that was in the original question

Linlin
Lapis Lazuli | Level 10

Hi Mike,

If you are going to try Astounding's code you have to try my code(I learned from Ksharp) as well:smileysilly:.

Astounding
PROC Star

Mike,

Did you try my version as well?  It looks longer, but it reads even less data.  It recognizes that 4 of the 5 records that you are reading with point= don't have to be read.  They were just read on the previous observation, and can be retained instead.

Just remember to add keep= , and remove the put statement at the end.

Astounding

Astounding
PROC Star

OK, I found some time.  I tested my code vs. Mike's and found that mine took less than 50% of the time.

With apologies, I didn't test other approaches but I know from past experience that MERGE takes much longer than SET.

MikeZdeb
Rhodochrosite | Level 12

hi ... I did say "though it's grossly inefficient ..."

so I modified my posting to do the same thing as Astounding's (his better idea of just remove one value and add one value to the total of X across five observations) ...

data want (keep=x mean);

* add first five values;

do nrec=1 to 5;

   set have nobs=obs;

   five = sum(five,x);

end;

* drop one value, add one value;

do nrec=6 by 1 until(done);

   set have end=done;

   set have (keep=x rename=(x=y)) point=nrec;

   five = five - x + y;

   mean = five / 5;

   if nrec gt obs then call missing(mean);

   output;

end;

stop;

run;

I tried the above versus Astounding's code with various size data sets (up to 1 million obs) and they take the same amount of time (with 1 million observations, each takes about 0.5 seconds of CPU time)

I also tried Linlin's and Haikuo's with the 1 million obs ... both were about the same amount of time as above, 0.5 seconds

last, I tried KSharp's really elegant looking SQL code (that really is clever), stopping at 10,000 obs since even that takes about 11 seconds of CPU time

Astounding
PROC Star

Mike,

I like your new structure ... much easier to read than mine.

A quick question about testing Linlin's code ... it looks like you were able to use this syntax on a temporary array:

mean (of v(*))

I got an error trying that, but then I'm using 9.1.  I guess a later release must have added functions across temporary arrays.

Thanks for testing and posting.

MikeZdeb
Rhodochrosite | Level 12

hi ... correct, I just tried that in V9.1 and got ...

3    data _null_;

4    array v(5) _temporary_ (1 2 3 4 5);

5    mean = mean(of v(*));

ERROR: The ARRAYNAME

  • specification requires a variable based array.
  • 6    put mean=;

    7    run;

    one reason to use that temporary array in Linlin's code is that such arrays are automatically retained

    in V9.1, you can change that temporary array to a variable based array and add a RETAIN statement

    then Linlin's code will work

    data temp;

    array v(0:4);

    retain v;

    set have;

    v(mod(_n_,5))=var;

    if _n_>=6 then do; mean_5=mean( of v(*)); output;end;

    keep mean_5;

    run;

    sas-innovate-2024.png

    Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

    Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    What is Bayesian Analysis?

    Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 21 replies
    • 1364 views
    • 8 likes
    • 9 in conversation