BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
FreelanceReinh
Jade | Level 19

Hi @promo_at_work,

 

I've just seen that @Tom already suggested the "_infile_ trick" while I was implementing and testing it:

/* Create test data */

data test(keep=row);
call streaminit(27182818);
length row $7699;
array c[700] $10 _temporary_;
do i=1 to 50000;
  do j=1 to 700;
    l=floor(rand('uniform',0,11)); /* 'integer' not available in 9.4M2 */
    c[j]=ifc(l=0,'#',substrn('0123456789',1,l));
  end;
  row=compress(catx('|', of c[*]),'#');
  output;
end;
run;

filename ft15f001 temp;

data split(keep=v:);
array v[700] $10;
infile ft15f001 dsd dlm='|';
input @@;
set test;
_infile_ = row;
input @1 v[*] @@; 
parmcards;
blah
;

With the above 50,000 observations the split step went 30 times faster than yours: 2.68 vs. 81.80 seconds.

(I learned these tricks from @data_null__ and hope I used that FT15F001 thing correctly.)

 

Edit: Replaced unnecessary DOW loop by simple SET statement, following @Tom's example.

s_lassen
Meteorite | Level 14

There might be some gain by using an INPUT statement instead of the SCAN function (SCAN reads the string from the beginning every time, INPUT advances a pointer on a line), something like this:

data split;
  set test;
  infile sasautos(verify.sas) dsd delimiter='|' lrecl=8000;
  if _N_= then input @;
  _infile_=row;
  length var1-var700 $10;
  input @1 var1-var700 @;
run;

I used sasautos(verify.sas) as the (dummy) infile, as that exists on all SAS installations, and the infile has to exist.

 

ChrisNZ
Tourmaline | Level 20

@s_lassen Ha! What I've been trying to make work but couldn't. Well done.

@Tom  infile cards is limited in length sadly. Something SAS should have corrected long ago.

This works:

data HAVE;
  length ROW $8000;
  ROW='|||';
  do I=1 to 700;
    ROW=catx('|',ROW,put(i,z3.));
  end;
  do J = 1 to 1e4;
    output;
  end;
run; 

data SPLIT;  *15 seconds;
  set HAVE;
  array VAR(700) $10;
  do I = 1 to dim(VAR);
    VAR[I]=scan(ROW,I,'|','m');
  end;
run;
    
data SPLIT2;  * 0.9 seconds;
  set HAVE;
  infile sasautos(verify.sas) dsd dlm='|' lrecl=8000;
  input @1 @;
  _infile_=ROW;
  input @1 (VAR1-VAR700) (:$10.) @@;
run;

proc compare data=SPLIT compare=SPLIT2; run;

 

 

 

Patrick
Opal | Level 21

@ChrisNZ

I'd change in your code from:

input @1 @;

to:

if _n_=1 then input @;

 

Because the only thing you want to do here is creating the input buffer which you then can address and populate using automatic variable _infile_

FreelanceReinh
Jade | Level 19

@Patrick: Interesting suggestion. I was also unsure if my decision to replace the DOW loop in my solution was really good, performance-wise, because of the repeated execution of the empty INPUT statement.

 

Now I've tested @ChrisNZ's solution (with a few DROP= options added) with and without the if _n_=1 condition, using an input dataset HAVE with 5 million observations (37.5 GB).

 

Result (run times of data SPLIT2 step in separate SAS sessions):

with if _n_=1:  2:54.24 min
without if ...: 2:49.45 min

Of course, for a serious comparison multiple runs would have to be performed, but the result suggests that SAS executes the empty INPUT statement possibly about as fast as it checks the IF condition.

ChrisNZ
Tourmaline | Level 20

It's a shame the TEMP and DUMMY engine can't be used for INFILE. That's where I wasted my time trying to make them work. I can't see how these engines can work for INFILE.

 

I opened an incident with tech support. The documentation is plain wrong it seems, and its text has been pasted from the FILE page.

 

 

 

 

 

Patrick
Opal | Level 21

@ChrisNZ

Here a way how to use the TEMP destination. Doesn't make it faster though on my laptop (SPLIT2 is still a bit faster).

filename myfile temp;
data _null_;
  file myfile;
  put;
  stop;
run;
data SPLIT3; 
  set HAVE;
  infile myfile dsd dlm='|' lrecl=8000;
  input @1 @;
  _infile_=ROW;
  input @1 (VAR1-VAR700) (:$10.) @@;
run;
ChrisNZ
Tourmaline | Level 20
You should not have to create the file beforehand. The documentation says you don't have to, and when using FILE you don't have to. INFILE fails to deliver what's on the tin.
Patrick
Opal | Level 21

@ChrisNZ

Then I suggest that you raise a ticket with SAS TechSupport so that either the documentation or the INFILE statement get enhanced.

ChrisNZ
Tourmaline | Level 20

@Patrick I am closing the tech support incident on this:

 

 

"We have received confirmation that the INFILE documentation will be updated to show more details in the DUMMY device, while the TEMP device could be removed from the doco, if found irrelevant. 

In addition, a feature request to get the input buffer assignment done using the INFILE, so that it can read in any data that gets passed to it.

So, down the line, there could be changes coming in the INFILE processing."

 

So now wait and see...  @JasonS_SAS's recent contribution makes me optimistic this on now on R&D's radar. This was much more effective than a ballot entry!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 25 replies
  • 3391 views
  • 30 likes
  • 10 in conversation