turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- how to program like this?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-29-2012 07:07 AM

Is there one function which can return a value stored in the next observation on the opposite of LAG function ?For example,

data have;

input x;

datalines;

1

2

3

4

5

6

7

8

9

10

;

run;

I want to assign the average value of the next 5 observations to one new variable. How to achieve this? Thanks in advance.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TomiKong

08-29-2012 08:00 AM

How big is your dataset? If it is not horribly long, try PROC SORT with a descending option and an out=newdata, and then apply the mean function to the lagged variables in newdata. I know there was something about "look ahead" functions in a thread recently, but I have made this method work before.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TomiKong

08-29-2012 08:29 AM

What do you want to assign to records 6 thru 10?

The look forward method that Steve mentioned may have been the one that uses multiple set statements. e.g.:

data want;

set have;

set have ( firstobs = 2 keep = x rename = (x = _x1) )

have ( obs = 1 drop = _all_ );

set have ( firstobs = 3 keep = x rename = (x = _x2) )

have ( obs = 1 drop = _all_ )

have ( obs = 1 drop = _all_ );

set have ( firstobs = 4 keep = x rename = (x = _x3) )

have ( obs = 1 drop = _all_ )

have ( obs = 1 drop = _all_ )

have ( obs = 1 drop = _all_ );

set have ( firstobs = 5 keep = x rename = (x = _x4) )

have ( obs = 1 drop = _all_ )

have ( obs = 1 drop = _all_ )

have ( obs = 1 drop = _all_ )

have ( obs = 1 drop = _all_ );

set have ( firstobs = 6 keep = x rename = (x = _x5) )

have ( obs = 1 drop = _all_ )

have ( obs = 1 drop = _all_ )

have ( obs = 1 drop = _all_ )

have ( obs = 1 drop = _all_ )

have ( obs = 1 drop = _all_ );

mean_x=mean(of _x;

run;

I think that would be the most efficient method. Conversely, if you have proc expand, you could use it to expand the dataset, and then use another datastep to calculate the mean. e.g.:

proc expand data=have out=want2 (drop=time);

convert x=x1 / method=none transformout=(lead 1);

convert x=x2 / method=none transformout=(lead 2);

convert x=x3 / method=none transformout=(lead 3);

convert x=x4 / method=none transformout=(lead 4);

convert x=x5 / method=none transformout=(lead 5);

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TomiKong

08-29-2012 09:24 AM

OK, here's my shot at it. I assumed that when we get to observation 6, and there are only 4 foward-looking observations, that we should take the mean of those 4:

data test;

do i=1 to 10;

output;

end;

run;

data test2;

set test nobs=_nobs_;

array forward {0:4} forward1-forward5;

retain forward1-forward5;

if _n_=1 then do _j_=2 to 5;

set test (rename=(i=forward_value)) point=_j_;

forward{mod(_j_,5)}=forward_value;

end;

if _n_+5 <= _nobs_ then do;

_j_=_n_+5;

set test (rename=(i=forward_value)) point=_j_;

forward{mod(_n_,5)}=forward_value;

end;

else forward{mod(_n_,5)}=.;

forward_5_mean = mean(of forward{*});

put _all_;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TomiKong

08-29-2012 10:28 AM

data have;

do var=1 to 10;

output;

end;

run;

data temp;

array v(0:4) _temporary_;

set have;

v(mod(_n_,5))=var;

if _n_>=6 then do; mean_5=mean( of v(*)); output;end;

keep mean_5;

run;

data want;

merge have temp;

proc print;run;

obs var mean_5

1 1 4

2 2 5

3 3 6

4 4 7

5 5 8

6 6 .

7 7 .

8 8 .

9 9 .

10 10 .

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TomiKong

08-29-2012 01:03 PM

Shamelessly stealing Linlin' idea, but modified to finish in one data step:

data have;

do x=1 to 10;

output;

end;

run;

data want (drop=x rename=y=x) ;

array nx5(0:5) _temporary_;

set have end=last nobs=nobs;

nx5(mod(_n_,6))=x;

if _n_>5 then do;

y=nx5(mod(_n_-5,6));

avg=(sum(of nx5(*))-y)/(n(of nx5(*))-1);

output;

end;

if last then do ;

do _n_=nobs+1 to nobs+5;

y=nx5(mod(_n_-5,6));

call missing(nx5(mod(_n_-5,6)),nx5(mod(_n_-6,6)));

avg=mean(of nx5(*));

output;

end;

end;

run;

proc print;run;

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TomiKong

08-29-2012 06:36 PM

hi ... another idea

though it's grossly inefficient to keep reading five new values from the data set over-and-over again,

it took < 1.5 seconds to process a million observations (on my "not so fast" PC)

**data want (keep=x mean);**

**set have nobs=obs;**

**if (obs - _n_) ge 5 then do;**

** do nrec=_n_ + 1 to _n_ + 5;**

** set have (rename=(x=y)) point=nrec;**

** five = sum(five, y);**

** end;**

** mean = five / 5;**

**end;**

**else call missing (mean);**

**run;**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MikeZdeb

08-29-2012 06:59 PM

Mike,

Since this is a more automated version of the code that I originally proposed (okay .. I stole it from Howard's sasCommunity.or Look Ahead Look Back article), I think that you really want to include a keep option in the additional set statements.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to art297

08-29-2012 07:05 PM

hi Art ... I was assuming the one variable data set that was in the original question

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MikeZdeb

08-29-2012 09:31 PM

Hi Mike,

If you are going to try Astounding's code you have to try my code(I learned from Ksharp) as well:smileysilly:.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MikeZdeb

08-29-2012 08:20 PM

Mike,

Did you try my version as well? It looks longer, but it reads even less data. It recognizes that 4 of the 5 records that you are reading with point= don't have to be read. They were just read on the previous observation, and can be retained instead.

Just remember to add keep= , and remove the put statement at the end.

Astounding

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

08-30-2012 02:11 PM

OK, I found some time. I tested my code vs. Mike's and found that mine took less than 50% of the time.

With apologies, I didn't test other approaches but I know from past experience that MERGE takes much longer than SET.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

08-30-2012 04:20 PM

hi ... I did say "though it's grossly inefficient ..."

so I modified my posting to do the same thing as Astounding's (his better idea of just remove one value and add one value to the total of X across five observations) ...

**data want (keep=x mean);**

*** add first five values;**

**do nrec=1 to 5;**

** set have nobs=obs;**

** five = sum(five,x);**

**end;**

*** drop one value, add one value;**

**do nrec=6 by 1 until(done);**

** set have end=done;**

** set have (keep=x rename=(x=y)) point=nrec;**

** five = five - x + y;**

** mean = five / 5;**

** if nrec gt obs then call missing(mean);**

** output;**

**end;**

**stop;**

**run;**

I tried the above versus Astounding's code with various size data sets (up to 1 million obs) and they take the same amount of time (with 1 million observations, each takes about 0.5 seconds of CPU time)

I also tried Linlin's and Haikuo's with the 1 million obs ... both were about the same amount of time as above, 0.5 seconds

last, I tried KSharp's really elegant looking SQL code (that really is clever), stopping at 10,000 obs since even that takes about 11 seconds of CPU time

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MikeZdeb

08-30-2012 04:48 PM

Mike,

I like your new structure ... much easier to read than mine.

A quick question about testing Linlin's code ... it looks like you were able to use this syntax on a temporary array:

mean (of v(*))

I got an error trying that, but then I'm using 9.1. I guess a later release must have added functions across temporary arrays.

Thanks for testing and posting.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

08-30-2012 05:18 PM

hi ... correct, I just tried that in V9.1 and got ...

3 data _null_;

4 array v(5) _temporary_ (1 2 3 4 5);

5 mean = mean(of v(*));

ERROR: The ARRAYNAME

6 put mean=;

7 run;

one reason to use that temporary array in Linlin's code is that such arrays are automatically retained

in V9.1, you can change that temporary array to a variable based array and add a RETAIN statement

then Linlin's code will work

**data temp;**

**array v(0:4);**

**retain v;**

**set have;**

**v(mod(_n_,5))=var;**

**if _n_>=6 then do; mean_5=mean( of v(*)); output;end;**

**keep mean_5;**

**run;**