> Honestly, the multiple SET stmt approach seems quite
> convoluted, mostly because each SAS variable of
> interest must be protected as a "shadow-named"
> variable through RETAIN and assignments, also the
> file must be read twice.
>
> Scott Barry
> SBBWorks, Inc.
Hello Scott.
True about retaining/assignments, but perfomance wise, it can easily be avoided, with the following modification:
data RESULT;
* AHEAD SET statement starting at OBS=2;
if not _EOF then set SAMPLE (firstobs=2 rename = (A=AA)) end=_EOF;
else AA=.; * last OBS;
set SAMPLE; * normal SET statement starting at OBS=1;
run;
But if I may disagree, not completely true about the efficiency of two pass read.
The two SET are indeed treated by the SAS supervisor as two different tables (double buffer, double pointer), but let us remember that when processing files most of the effort is done at disk I/O. And because of that, at the lowest level every I/O operation is done by block and cached into memory. So there is a very good chance, that for the same file, the second read will not produce another read operation and the data will be retrieved from memory.
Then of course, as already pointed by kmg, PROC SORT falls in the group of the most resource consuming procedures.
By the way, it just occurred to me another approach, which would be, to perform a merge with the same table (WITHOUT the BY statement) being one of them one observation "ahead" of the other.
data RESULT;
merge SAMPLE SAMPLE (firstobs=2 keep=A rename=(A=AA));
run;
Indeed, so many ways.
Cheers from Portugal.
Daniel Santos @
www.cgd.pt.