Help using Base SAS procedures

how to program like this?

Reply
Contributor
Posts: 36

how to program like this?

Is there one function which can return a value stored in the next observation on the opposite of  LAG function ?For example,

data have;

input x;

datalines;

1

2

3

4

5

6

7

8

9

10

;

run;

I want to assign the average value of the next 5 observations to one new variable. How to achieve this? Thanks in advance.

Respected Advisor
Posts: 2,655

Re: how to program like this?

How big is your dataset?  If it is not horribly long, try PROC SORT with a descending option and an out=newdata, and then apply the mean function to the lagged variables in newdata.  I know there was something about "look ahead" functions in a thread recently, but I have made this method work before.

Steve Denham

PROC Star
Posts: 7,492

Re: how to program like this?

What do you want to assign to records 6 thru 10?

The look forward method that Steve mentioned may have been the one that uses multiple set statements.  e.g.:

data want;

  set have;

  set have ( firstobs = 2 keep = x rename = (x = _x1) )

      have (      obs = 1 drop = _all_               );

  set have ( firstobs = 3 keep = x rename = (x = _x2) )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               );

  set have ( firstobs = 4 keep = x rename = (x = _x3) )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               );

  set have ( firstobs = 5 keep = x rename = (x = _x4) )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               );

  set have ( firstobs = 6 keep = x rename = (x = _x5) )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               )

      have (      obs = 1 drop = _all_               );

  mean_x=mean(of _xSmiley Happy;

run;

I think that would be the most efficient method.  Conversely, if you have proc expand, you could use it to expand the dataset, and then use another datastep to calculate the mean.  e.g.:

proc expand data=have out=want2 (drop=time); 

   convert x=x1  / method=none transformout=(lead 1); 

   convert x=x2  / method=none transformout=(lead 2); 

   convert x=x3  / method=none transformout=(lead 3); 

   convert x=x4  / method=none transformout=(lead 4); 

   convert x=x5  / method=none transformout=(lead 5); 

run;

Super User
Posts: 5,517

Re: how to program like this?

OK, here's my shot at it.  I assumed that when we get to observation 6, and there are only 4 foward-looking observations, that we should take the mean of those 4:

 

data test;

do i=1 to 10;

output;

end;

run;

data test2;

set test nobs=_nobs_;

array forward {0:4} forward1-forward5;

retain forward1-forward5;

if _n_=1 then do _j_=2 to 5;

set test (rename=(i=forward_value)) point=_j_;

forward{mod(_j_,5)}=forward_value;

end;

if _n_+5 <= _nobs_ then do;

_j_=_n_+5;

set test (rename=(i=forward_value)) point=_j_;

forward{mod(_n_,5)}=forward_value;

end;

else forward{mod(_n_,5)}=.;

forward_5_mean = mean(of forward{*});

put _all_;

run;

Super Contributor
Posts: 1,636

Re: how to program like this?

data have;
do var=1 to 10;
output;
end;
run;
data temp;
array v(0:4) _temporary_;
set have;
v(mod(_n_,5))=var;
if _n_>=6 then do; mean_5=mean( of v(*)); output;end;
keep mean_5;
run;
data want;
  merge have temp;
proc print;run;
                                       obs    var    mean_5

                                         1      1       4
                                         2      2       5
                                         3      3       6
                                         4      4       7
                                         5      5       8
                                         6      6       .
                                         7      7       .
                                         8      8       .
                                         9      9       .
                                        10     10       .

Respected Advisor
Posts: 3,156

Re: how to program like this?

Shamelessly stealing Linlin' idea, but modified to finish in one data step:

data have;

do x=1 to 10;

output;  

end;

run;

data want (drop=x rename=y=x) ;

array nx5(0:5) _temporary_;

set have end=last nobs=nobs;

nx5(mod(_n_,6))=x;

if _n_>5 then do;

   y=nx5(mod(_n_-5,6));

   avg=(sum(of nx5(*))-y)/(n(of nx5(*))-1);

   output;

end;

if last then do ;

       do _n_=nobs+1 to nobs+5;

          y=nx5(mod(_n_-5,6));

       call missing(nx5(mod(_n_-5,6)),nx5(mod(_n_-6,6)));

       avg=mean(of nx5(*));

       output;

       end;

end;

run;

proc print;run;

Haikuo

Valued Guide
Posts: 765

Re: how to program like this?

hi ... another idea

though it's grossly inefficient to keep reading five new values from the data set over-and-over again,

it took < 1.5 seconds to process a million observations (on my "not so fast" PC)

data want (keep=x mean);

set have nobs=obs;

if (obs - _n_) ge 5 then do;

   do nrec=_n_ + 1 to _n_ + 5;

      set have (rename=(x=y)) point=nrec;

      five = sum(five, y);

   end;

   mean = five / 5;

end;

else call missing (mean);

run;

PROC Star
Posts: 7,492

Re: how to program like this?

Mike,

Since this is a more automated version of the code that I originally proposed (okay .. I stole it from Howard's sasCommunity.or Look Ahead Look Back article), I think that you really want to include a keep option in the additional set statements.

Valued Guide
Posts: 765

Re: how to program like this?

hi Art ... I was assuming the one variable data set that was in the original question

Super Contributor
Posts: 1,636

Re: how to program like this?

Hi Mike,

If you are going to try Astounding's code you have to try my code(I learned from Ksharp) as well:smileysilly:.

Super User
Posts: 5,517

Re: how to program like this?

Mike,

Did you try my version as well?  It looks longer, but it reads even less data.  It recognizes that 4 of the 5 records that you are reading with point= don't have to be read.  They were just read on the previous observation, and can be retained instead.

Just remember to add keep= , and remove the put statement at the end.

Astounding

Super User
Posts: 5,517

Re: how to program like this?

Posted in reply to Astounding

OK, I found some time.  I tested my code vs. Mike's and found that mine took less than 50% of the time.

With apologies, I didn't test other approaches but I know from past experience that MERGE takes much longer than SET.

Valued Guide
Posts: 765

Re: how to program like this?

Posted in reply to Astounding

hi ... I did say "though it's grossly inefficient ..."

so I modified my posting to do the same thing as Astounding's (his better idea of just remove one value and add one value to the total of X across five observations) ...

data want (keep=x mean);

* add first five values;

do nrec=1 to 5;

   set have nobs=obs;

   five = sum(five,x);

end;

* drop one value, add one value;

do nrec=6 by 1 until(done);

   set have end=done;

   set have (keep=x rename=(x=y)) point=nrec;

   five = five - x + y;

   mean = five / 5;

   if nrec gt obs then call missing(mean);

   output;

end;

stop;

run;

I tried the above versus Astounding's code with various size data sets (up to 1 million obs) and they take the same amount of time (with 1 million observations, each takes about 0.5 seconds of CPU time)

I also tried Linlin's and Haikuo's with the 1 million obs ... both were about the same amount of time as above, 0.5 seconds

last, I tried KSharp's really elegant looking SQL code (that really is clever), stopping at 10,000 obs since even that takes about 11 seconds of CPU time

Super User
Posts: 5,517

Re: how to program like this?

Mike,

I like your new structure ... much easier to read than mine.

A quick question about testing Linlin's code ... it looks like you were able to use this syntax on a temporary array:

mean (of v(*))

I got an error trying that, but then I'm using 9.1.  I guess a later release must have added functions across temporary arrays.

Thanks for testing and posting.

Valued Guide
Posts: 765

Re: how to program like this?

Posted in reply to Astounding

hi ... correct, I just tried that in V9.1 and got ...

3    data _null_;

4    array v(5) _temporary_ (1 2 3 4 5);

5    mean = mean(of v(*));

ERROR: The ARRAYNAME

  • specification requires a variable based array.
  • 6    put mean=;

    7    run;

    one reason to use that temporary array in Linlin's code is that such arrays are automatically retained

    in V9.1, you can change that temporary array to a variable based array and add a RETAIN statement

    then Linlin's code will work

    data temp;

    array v(0:4);

    retain v;

    set have;

    v(mod(_n_,5))=var;

    if _n_>=6 then do; mean_5=mean( of v(*)); output;end;

    keep mean_5;

    run;

    Ask a Question
    Discussion stats
    • 21 replies
    • 606 views
    • 8 likes
    • 9 in conversation