Re: What is the opposite statement of "Lag"?

Ken_oy · Posted 07-29-2009 10:59 AM

I know we have a "Lag" statement, and we can use it to do as the following:
-----------------------
v1 lag_v1

1 .
2 1
3 2
4 3
5 4
----------------------

But, do we have any statement to do:

-----------------------
v1 X_v1

1 2
2 3
3 4
4 5
5 .
----------------------

Thank you!!

Doc_Duke · Posted 07-29-2009 11:42 AM

No. SAS Doesn't do a 'read ahead' that provides data to the DATA Step. Your best bet is to sort DESCENDING and use the LAG and then resort.

LAP · Posted 07-29-2009 11:45 AM

I don't know of any "reverse lag" but you could reverse the sort order on the data set. Assuming column one can be used to as the key, you could do something like this.

proc sort data=test;
by descending v1;

data test2;
set test;
lag_v1 = lag(v1);
run;

proc sort data=test2;
by v1;

If v1 can not be used as your key variable, you could use _n_ to sort by;

Data test;
set example;
key_var = _n_;
run;

proc sort data=test; by descending key_var;

etc..

deleted_user · Posted 07-29-2009 11:59 AM

The following works OK if you only need to do this on one variable (as opposed to the entire list of variables). Note that you can "look forward" more than one by changing the constant of addition.

data test2;
set test nobs=_N;

_pointer = _N_+1;

if _pointer le _N then
set test(keep=v1 rename=(v1=X_v1)) point=_pointer;
else
X_v1 = .;
run;

sbb · Posted 07-29-2009 12:29 PM

Create a suitable file for processing, as required, and assign a SAS temporary variable to track the observation number ( OBSNUM = _N_; ). Then re-sort the file with BY DESCENDING OBSNUM; and you can then use LAG.

Scott Barry
SBBWorks, Inc.

DanielSantos · Posted 07-29-2009 01:07 PM

There's actually a simple way for simulating an "ahead" read, using multiple SET statements.
It is a very efficient technique that is not often seen because of its obscure readability, as it breaks up a little the logic of the datastep iteration. I'll try to explain it at the end.

Say this is your sample:

data SAMPLE;
do A=1 to 6;
output;
end;
run;

And this is the code:

data RESULT;
retain AA; * retain AHEAD variable;
* AHEAD SET statement starting at OBS=2;
if not _EOF then set SAMPLE (firstobs=2) end=_EOF;
else A=.; * last OBS;
AA=A; * place AHEAD A value to AHEAD variable;
set SAMPLE; * normal SET statement starting at OBS=1;
run;

OK, then...
The first SET statement is the ahead statement and begins at OBS Nº2.
The second SET statement is the actual statement and begins normally at OBS Nº1

For every datastep iterations, here what's happening.

Read 1 OBS from Datastep (First SET, 1 OBS ahead).
Place value of A into AA
Read 1 OBS from Datastep (Second SET)
Output OBS of Second SET with AA values of First SET.
Loop.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.

sbb · Posted 07-29-2009 01:37 PM

Honestly, the multiple SET stmt approach seems quite convoluted, mostly because each SAS variable of interest must be protected as a "shadow-named" variable through RETAIN and assignments, also the file must be read twice.

Scott Barry
SBBWorks, Inc.

deleted_user · Posted 07-29-2009 01:45 PM

I disagree. Some sort of variable creation is needed in any event. In order to do a SORT, the data will have to be read through at least once. In terms of efficiency, my bet would be on Daniel's solution (although I would expect the POINT= to be similar).

DanielSantos · Posted 07-29-2009 04:06 PM

> Honestly, the multiple SET stmt approach seems quite
> convoluted, mostly because each SAS variable of
> interest must be protected as a "shadow-named"
> variable through RETAIN and assignments, also the
> file must be read twice.
>
> Scott Barry
> SBBWorks, Inc.

Hello Scott.

True about retaining/assignments, but perfomance wise, it can easily be avoided, with the following modification:

data RESULT;
* AHEAD SET statement starting at OBS=2;
if not _EOF then set SAMPLE (firstobs=2 rename = (A=AA)) end=_EOF;
else AA=.; * last OBS;
set SAMPLE; * normal SET statement starting at OBS=1;
run;

But if I may disagree, not completely true about the efficiency of two pass read.
The two SET are indeed treated by the SAS supervisor as two different tables (double buffer, double pointer), but let us remember that when processing files most of the effort is done at disk I/O. And because of that, at the lowest level every I/O operation is done by block and cached into memory. So there is a very good chance, that for the same file, the second read will not produce another read operation and the data will be retrieved from memory.
Then of course, as already pointed by kmg, PROC SORT falls in the group of the most resource consuming procedures.

By the way, it just occurred to me another approach, which would be, to perform a merge with the same table (WITHOUT the BY statement) being one of them one observation "ahead" of the other.

data RESULT;
merge SAMPLE SAMPLE (firstobs=2 keep=A rename=(A=AA));
run;

Indeed, so many ways.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.

Ken_oy · Posted 07-29-2009 01:28 PM

Thank you so much!!

Problem fixed!!

Ken_oy · Posted 07-29-2009 01:48 PM

I am using PROC SQL, and then this problem fixed.

Using the following code:
-------------------------------------------
-------------------------------------------
data have;
do group = 'a','b';
do value = 1,3,3,2,5,1,8;
output; end; end; run;

proc sql;
create view numbered as select *,
monotonic() as obsnum from have;
create view want as select numbered.group, numbered.value, cross.value as lookahead from numbered left join numbered as cross
on numbered.group=cross.group and numbered.obsnum+1=cross.obsnum
order by numbered.obsnum;
quit;
-------------------------------------------
-------------------------------------------

We can have the Result below:
-------------------------------------------
-------------------------------------------
Obs group value lookahead

1 a 1 3
2 a 3 3
3 a 3 2
4 a 2 5
5 a 5 1
6 a 1 8
7 a 8 .
8 b 1 3
9 b 3 3
10 b 3 2
11 b 2 5
12 b 5 1
13 b 1 8
14 b 8 .
-------------------------------------------
-------------------------------------------

ChrisNZ · Posted 07-29-2009 05:51 PM

You should still use lag() in your example, except the first column is now lagged instead of the second.
Also, reading ahead is the same as writing late, look for thread 5990 "How to add Previous observation values"

High-Performance SAS Coding - Third Edition

SAS Innovate 2025: Call for Content

Classroom Training Available!