BookmarkSubscribeRSS Feed
Ken_oy
Fluorite | Level 6
I know we have a "Lag" statement, and we can use it to do as the following:
-----------------------
v1 lag_v1

1 .
2 1
3 2
4 3
5 4
----------------------

But, do we have any statement to do:

-----------------------
v1 X_v1

1 2
2 3
3 4
4 5
5 .
----------------------

Thank you!!
11 REPLIES 11
Doc_Duke
Rhodochrosite | Level 12
No. SAS Doesn't do a 'read ahead' that provides data to the DATA Step. Your best bet is to sort DESCENDING and use the LAG and then resort.
LAP
Quartz | Level 8 LAP
Quartz | Level 8
I don't know of any "reverse lag" but you could reverse the sort order on the data set. Assuming column one can be used to as the key, you could do something like this.

proc sort data=test;
by descending v1;

data test2;
set test;
lag_v1 = lag(v1);
run;

proc sort data=test2;
by v1;

If v1 can not be used as your key variable, you could use _n_ to sort by;

Data test;
set example;
key_var = _n_;
run;

proc sort data=test; by descending key_var;

etc..
deleted_user
Not applicable
The following works OK if you only need to do this on one variable (as opposed to the entire list of variables). Note that you can "look forward" more than one by changing the constant of addition.

data test2;
set test nobs=_N;

_pointer = _N_+1;

if _pointer le _N then
set test(keep=v1 rename=(v1=X_v1)) point=_pointer;
else
X_v1 = .;
run;
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Create a suitable file for processing, as required, and assign a SAS temporary variable to track the observation number ( OBSNUM = _N_; ). Then re-sort the file with BY DESCENDING OBSNUM; and you can then use LAG.

Scott Barry
SBBWorks, Inc.
DanielSantos
Barite | Level 11
There's actually a simple way for simulating an "ahead" read, using multiple SET statements.
It is a very efficient technique that is not often seen because of its obscure readability, as it breaks up a little the logic of the datastep iteration. I'll try to explain it at the end.

Say this is your sample:

data SAMPLE;
do A=1 to 6;
output;
end;
run;

And this is the code:

data RESULT;
retain AA; * retain AHEAD variable;
* AHEAD SET statement starting at OBS=2;
if not _EOF then set SAMPLE (firstobs=2) end=_EOF;
else A=.; * last OBS;
AA=A; * place AHEAD A value to AHEAD variable;
set SAMPLE; * normal SET statement starting at OBS=1;
run;

OK, then...
The first SET statement is the ahead statement and begins at OBS Nº2.
The second SET statement is the actual statement and begins normally at OBS Nº1

For every datastep iterations, here what's happening.

Read 1 OBS from Datastep (First SET, 1 OBS ahead).
Place value of A into AA
Read 1 OBS from Datastep (Second SET)
Output OBS of Second SET with AA values of First SET.
Loop.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Honestly, the multiple SET stmt approach seems quite convoluted, mostly because each SAS variable of interest must be protected as a "shadow-named" variable through RETAIN and assignments, also the file must be read twice.

Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable
I disagree. Some sort of variable creation is needed in any event. In order to do a SORT, the data will have to be read through at least once. In terms of efficiency, my bet would be on Daniel's solution (although I would expect the POINT= to be similar).
DanielSantos
Barite | Level 11
> Honestly, the multiple SET stmt approach seems quite
> convoluted, mostly because each SAS variable of
> interest must be protected as a "shadow-named"
> variable through RETAIN and assignments, also the
> file must be read twice.
>
> Scott Barry
> SBBWorks, Inc.

Hello Scott.

True about retaining/assignments, but perfomance wise, it can easily be avoided, with the following modification:

data RESULT;
* AHEAD SET statement starting at OBS=2;
if not _EOF then set SAMPLE (firstobs=2 rename = (A=AA)) end=_EOF;
else AA=.; * last OBS;
set SAMPLE; * normal SET statement starting at OBS=1;
run;

But if I may disagree, not completely true about the efficiency of two pass read.
The two SET are indeed treated by the SAS supervisor as two different tables (double buffer, double pointer), but let us remember that when processing files most of the effort is done at disk I/O. And because of that, at the lowest level every I/O operation is done by block and cached into memory. So there is a very good chance, that for the same file, the second read will not produce another read operation and the data will be retrieved from memory.
Then of course, as already pointed by kmg, PROC SORT falls in the group of the most resource consuming procedures.

By the way, it just occurred to me another approach, which would be, to perform a merge with the same table (WITHOUT the BY statement) being one of them one observation "ahead" of the other.

data RESULT;
merge SAMPLE SAMPLE (firstobs=2 keep=A rename=(A=AA));
run;

Indeed, so many ways.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.
Ken_oy
Fluorite | Level 6
Thank you so much!!

Problem fixed!!
Ken_oy
Fluorite | Level 6
I am using PROC SQL, and then this problem fixed.

Using the following code:
-------------------------------------------
-------------------------------------------
data have;
do group = 'a','b';
do value = 1,3,3,2,5,1,8;
output; end; end; run;

proc sql;
create view numbered as select *,
monotonic() as obsnum from have;
create view want as select numbered.group, numbered.value, cross.value as lookahead from numbered left join numbered as cross
on numbered.group=cross.group and numbered.obsnum+1=cross.obsnum
order by numbered.obsnum;
quit;
-------------------------------------------
-------------------------------------------



We can have the Result below:
-------------------------------------------
-------------------------------------------
Obs group value lookahead

1 a 1 3
2 a 3 3
3 a 3 2
4 a 2 5
5 a 5 1
6 a 1 8
7 a 8 .
8 b 1 3
9 b 3 3
10 b 3 2
11 b 2 5
12 b 5 1
13 b 1 8
14 b 8 .
-------------------------------------------
-------------------------------------------
ChrisNZ
Tourmaline | Level 20
You should still use lag() in your example, except the first column is now lagged instead of the second.
Also, reading ahead is the same as writing late, look for thread 5990 "How to add Previous observation values"

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 12355 views
  • 0 likes
  • 7 in conversation