Hi @promo_at_work,
I've just seen that @Tom already suggested the "_infile_ trick" while I was implementing and testing it:
/* Create test data */
data test(keep=row);
call streaminit(27182818);
length row $7699;
array c[700] $10 _temporary_;
do i=1 to 50000;
do j=1 to 700;
l=floor(rand('uniform',0,11)); /* 'integer' not available in 9.4M2 */
c[j]=ifc(l=0,'#',substrn('0123456789',1,l));
end;
row=compress(catx('|', of c[*]),'#');
output;
end;
run;
filename ft15f001 temp;
data split(keep=v:);
array v[700] $10;
infile ft15f001 dsd dlm='|';
input @@;
set test;
_infile_ = row;
input @1 v[*] @@;
parmcards;
blah
;
With the above 50,000 observations the split step went 30 times faster than yours: 2.68 vs. 81.80 seconds.
(I learned these tricks from @data_null__ and hope I used that FT15F001 thing correctly.)
Edit: Replaced unnecessary DOW loop by simple SET statement, following @Tom's example.
There might be some gain by using an INPUT statement instead of the SCAN function (SCAN reads the string from the beginning every time, INPUT advances a pointer on a line), something like this:
data split;
set test;
infile sasautos(verify.sas) dsd delimiter='|' lrecl=8000;
if _N_= then input @;
_infile_=row;
length var1-var700 $10;
input @1 var1-var700 @;
run;
I used sasautos(verify.sas) as the (dummy) infile, as that exists on all SAS installations, and the infile has to exist.
@s_lassen Ha! What I've been trying to make work but couldn't. Well done.
@Tom infile cards is limited in length sadly. Something SAS should have corrected long ago.
This works:
data HAVE;
length ROW $8000;
ROW='|||';
do I=1 to 700;
ROW=catx('|',ROW,put(i,z3.));
end;
do J = 1 to 1e4;
output;
end;
run;
data SPLIT; *15 seconds;
set HAVE;
array VAR(700) $10;
do I = 1 to dim(VAR);
VAR[I]=scan(ROW,I,'|','m');
end;
run;
data SPLIT2; * 0.9 seconds;
set HAVE;
infile sasautos(verify.sas) dsd dlm='|' lrecl=8000;
input @1 @;
_infile_=ROW;
input @1 (VAR1-VAR700) (:$10.) @@;
run;
proc compare data=SPLIT compare=SPLIT2; run;
I'd change in your code from:
input @1 @;
to:
if _n_=1 then input @;
Because the only thing you want to do here is creating the input buffer which you then can address and populate using automatic variable _infile_
@Patrick: Interesting suggestion. I was also unsure if my decision to replace the DOW loop in my solution was really good, performance-wise, because of the repeated execution of the empty INPUT statement.
Now I've tested @ChrisNZ's solution (with a few DROP= options added) with and without the if _n_=1 condition, using an input dataset HAVE with 5 million observations (37.5 GB).
Result (run times of data SPLIT2 step in separate SAS sessions):
with if _n_=1: 2:54.24 min without if ...: 2:49.45 min
Of course, for a serious comparison multiple runs would have to be performed, but the result suggests that SAS executes the empty INPUT statement possibly about as fast as it checks the IF condition.
It's a shame the TEMP and DUMMY engine can't be used for INFILE. That's where I wasted my time trying to make them work. I can't see how these engines can work for INFILE.
I opened an incident with tech support. The documentation is plain wrong it seems, and its text has been pasted from the FILE page.
Here a way how to use the TEMP destination. Doesn't make it faster though on my laptop (SPLIT2 is still a bit faster).
filename myfile temp;
data _null_;
file myfile;
put;
stop;
run;
data SPLIT3;
set HAVE;
infile myfile dsd dlm='|' lrecl=8000;
input @1 @;
_infile_=ROW;
input @1 (VAR1-VAR700) (:$10.) @@;
run;
Then I suggest that you raise a ticket with SAS TechSupport so that either the documentation or the INFILE statement get enhanced.
@Patrick I am closing the tech support incident on this:
"We have received confirmation that the INFILE documentation will be updated to show more details in the DUMMY device, while the TEMP device could be removed from the doco, if found irrelevant.
In addition, a feature request to get the input buffer assignment done using the INFILE, so that it can read in any data that gets passed to it.
So, down the line, there could be changes coming in the INFILE processing."
So now wait and see... @JasonS_SAS's recent contribution makes me optimistic this on now on R&D's radar. This was much more effective than a ballot entry!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.