I’ve noticed that when enabling the MFILE option, which is very handy in many situations ranging from debugging to producing production-ready programs, program execution time is severely impacted. Much more than the simple I/O operation of writing the MPRINT statements already being written to the program log would suggest. Depending on the complexity and amount of macro code to be resolved, this may cause the execution to take 5-7 times longer. And to be precise, I am only referring to adding the MFILE option on top of the MPRINT. I’ve coded a simple example in which the performance within SODA is ~2.5 times slower. I’ve noticed this behavior in the past, in local DM, SAS server, SAS EG, etc. applications., in Windows environments etc. It seems like a global issue with the MFILE implementation.
To investigate, I’ve also played around with the FILENAME statement (in which the MPRINT fileref is set) but it seems to have little impact.
Here is a snip of the proc print and means output
@Sas technical support? Anyone has any ideas?
The code used to test:
%macro test;
    %do i=1 %to 250;
        data t&i.;
            %do j=1 %to 100;
                do k&j.=1 to 5;
                    output;
                end;
            %end;
        run;
    %end;
    %do i=1 %to 250;
        data t&i.;
            %do j=1 %to 100;
                do k&j.=1 to 5;
                    output;
                end;
            %end;
        run;
    %end;
%mend;
%macro wmfile;
    filename mprint temp;
    options mprint mfile;
    %test;
    options nomfile;
%mend;
%macro womfile;
    options mprint nomfile;
    %test;
%mend;
data bench;
    length onbench $200;
    do run=1 to 3;
        do onbench='%wmfile', '%womfile';
            start=time();
            rc=dosubl(onbench);
            end=time();
            mark=end-start;
            output;
        end;
    end;
run;
proc print data=bench;
run;
proc means data=bench;
    class onbench;
    var mark;
run;20 year SAS user, but first post here. Thanks for your time.
Searching around internally I have found only a few reports of slower performance when using MFILE, and all had to do with the location where the file was being written. Slower/latent file systems, such as a mounted NFS that isn't optimized for network access from SAS, can cause this.
I see you've opened a Tech Support case for the same issue. If the main use case you see is on SODA (the free on-demand environment), I don't expect Tech Support to spend a lot of time to diagnose it. If you can replicate on a different environment in your workplace then it might get more attention. One follow-up would be to reference I/O testing and performance tuning guidelines as collected in this SAS Note.
Thank you @ChrisHemedinger for taking your time to reply. Really appreciate it. I see your points and making sure you write to a fast location makes absolute sense. Supporting that is the observation that shifting the MFILE target, via the FILENAME statement, to DUMMY from TEMP removes any performance difference.
However, as a layman not knowing too much about the behind-the-scenes-stuff… I'm not buying it. 😊 I think there is a problem with the MFILE implementation. Reason being, that if I run this script in my work environment in batch, having set the location of the MFILE file to the exact same folder location as the .log file, you still see huge differences. And we must bear in mind that SAS, because of the MPRINT, writes character for character the exact same content to the log and to the MFILE target. But enabling MFILE, as the only difference, slows the process down enormously. And to me, that behavior make no sense, and I doubt that is the intention of the MFILE implementation.
I have also placed the MFILE target in WORK, but the performance is exactly the same as with using TEMP on FILENAME. That is expected, I guess, as the TEMP location points basically to the same location as WORK.
I tested in SODA to see whether what I observed in my work environment was linked to the setup at work or not. The results points to that it wasn’t but rather it points to the MFILE implementation being the culprit. Note: As a whole the SODA setup is so much faster than SAS at work, niiice, and less impacted by MFILE. I had hoped raising a ticket in the SODA environment would have been taken a bit better care of, but I’ll now make sure to push it through the work channels as well.
Nice test script. I'd be curious of the results if you change RUN to RUN CANCEL in each of your two DATA steps in the macro. I think that would let you look at the time to generate the code and write it to MFILE, without time to execute the code.
Also curious if you changed from DOSUBL to CALL EXECUTE how that would speed things up. DOSUBL is much slower then CALL EXECUTE. But of course here you're using DOSUBL for both MFILE on and off, so that shouldn't be the root cause.
I've never really used MFILE MFILE my self, but certainly Chris's idea of writing MFILE to somewhere slow seems like a good starting point. What happens if you send MFILE to your somewhere in your WORK library? Presumably that should be somewhere fast.
Thanks for the nice comment! I tried using RUN CANCEL in the script, however, it had no effect on execution time. I also used CALL EXECUTE and it does require a fair bit of rewriting the script because of the way it works compared to dosubl(), but the results were the same.
Sending the MFILE target to work doesn't change performance compared to TEMP - the two locations are also very much "neighbors"...
I believe I managed to create a test script that at least in my environment clearly demonstrates what @peterprec raised (SAS9.4M8, local install on Windows laptop).
Based on my testing the time consumed appears to increase linear with the the lines of code the macro generates and the time consumed is much higher than writing the same information to the SAS log (mprint).
What the test script does.
Here the test script that I run:
%macro genCode(genlineOfCode);
  %if %nrbquote(&genlineOfCode) = %nrbquote() %then %let genlineOfCode=100;
  %do i=1 %to &genlineOfCode;
    comment something;
  %end;
%mend;
options fullstimer nomprint nomfile nosymbolgen nomlogic;
data work.timings;
  stop;
  attrib 
    test_case     length=$20
    lines_of_code format=best32.
    start_dttm    format=datetime27.6
    stop_dttm     format=datetime27.6
    elapsed_time  format=best32.
    ;
  call missing(of _all_);
run;
/* write log to temp file in WORK to avoid time delays by EG pulling the log back to client side */
%put WORK location: %sysfunc(pathname(work));
filename logout temp;
proc printto log=logout;
run;
/****************************/
filename codegen temp;
data _null_;
/*  file print;*/
  file codegen;
  do i=0 to 4000 by 200;
    put
      /* execute macro %genCode() with mprint NOmfile */
      '/* lines of code ' i '*/' /
      'data _null_;stop;file logout;run;' /
      'options mprint nomfile;' /
      '%let start=%sysfunc(datetime());' /
      '%genCode(' i ');' /
      '%let stop=%sysfunc(datetime());' /
      'options nomprint nomfile;' /
      'proc sql;' /
      '  insert into work.timings values("mprint NOmfile",' i ',&start,&stop,%sysevalf(&stop - &start));' /
      'quit;' /
      ;
    put
      /* execute macro %genCode() with mprint mfile */
      'data _null_;stop;file logout;run;' /
      'filename mprint temp;' /
      'options mprint mfile;' /
      '%let start=%sysfunc(datetime());' /
      '%genCode(' i ');' /
      '%let stop=%sysfunc(datetime());' /
      'proc sql;' /
      '  insert into work.timings values("mprint mfile",' i ',&start,&stop,%sysevalf(&stop - &start));' /
      'quit;' /
      'options nomprint nomfile;' /
      ;
  end;
run;
%include codegen /source2;
/****************************/
/* restore log destination back to default */
proc printto log=log;
run;
title 'Elapsed Times by Lines of Code';
footnote;
symbol1
	interpol=join
	height=10pt
	value=none
	line=1
	width=2
	cv = _style_
;
axis1
	style=1
	width=1
	minor=none
;
axis2
	style=1
	width=1
	minor=none
;
proc gplot data = work.timings nocache ;
  plot elapsed_time * lines_of_code=test_case  /
   	vaxis=axis1
  	haxis=axis2
  frame;
run; quit;
title;
Great post, @Patrick ! It completely confirms my observations. It aligns poorly with cause being the MFILE target written to a slow location, but supports that something in the MFILE implementations does not work as intended. Completely guessing here, it almost feels like MFILE writes one resolved programming statement/token at a time to disk (i.e., no buffering at all) while the rest of the execution sits and waits for the MFILE I/O to finish.
@peterprec wrote:
Great post, @Patrick ! It completely confirms my observations. It aligns poorly with cause being the MFILE target written to a slow location, but supports that something in the MFILE implementations does not work as intended. Completely guessing here, it almost feels like MFILE writes one resolved programming statement/token at a time to disk (i.e., no buffering at all) while the rest of the execution sits and waits for the MFILE I/O to finish.
I think that's a good guess. There are system options to control the amount of log buffering, but they probably don't apply to MFILE. Could definitely be that adding MFILE just changes the process from "send a token to the word scanner (or compiler or whatever is asking for tokens)" to "send a token to a file, then send the token to the word scanner".
You have to assume a LOT of work was put into general speed of writing to the log over many many years, since every time code is submitted it writes to the log, and log files get big.  My guess is not much work was put into developing logging for MFILE, because it's just one part of macro language, is less-used, etc.  And since the macro language has been 'functionally stable' for a while, I wouldn't think the chances are high they a developer will dive in to improve it.
When I replaced:
comment something;with simple:
data _null_; run;the plot goes to:
when I additionally changed loop range to:
do i=4000 to 8000 by 500;the plot goes:
all together:
Bart
@yabwon - good stuff! Your results baffled me...
So, however, I made the use case just a little different and only slightly more complicated:
%macro genCode(genlineOfCode);
    %if %nrbquote(&genlineOfCode)=%nrbquote() %then
        %let genlineOfCode=100;
    data _null_;
        %do i=1 %to &genlineOfCode;
            a=&i;
        %end;
    run;
%mend;and then I get this (SODA environment):
So, to me, @ChrisHemedinger, it still looks like the MFILE is not behaving according to intention? Disagree? 🙂
Maybe so! I updated the TS case that you opened with a reference to this discussion.
@yabwon Hmmm... When I run below in my environment with do i=0 to 4000 by 200;
%macro genCode(genlineOfCode);
  %if %nrbquote(&genlineOfCode) = %nrbquote() %then %let genlineOfCode=100;
  %do i=1 %to &genlineOfCode;
    data _null_; run;
  %end;
%mend;I got below result
I've now managed to run both test scripts, and their variations, on both SODA and in my work enviroment (SAS server, windows) and in all cases I can replicate the results shown in this thread.
One more test from a grid environment I have privilege to work on.
Results for "comment" and "data _null_" looks similar.
Bart
@yabwon - grid, sounds sick! 🙂 Perhaps you can run this a well:
%macro genCode(genlineOfCode);
    %if %nrbquote(&genlineOfCode)=%nrbquote() %then %let genlineOfCode=100;
    data _null_;
        %do i=1 %to &genlineOfCode;
            a=&i;
        %end;
    run;
%mend;Would be interesting to see - and I'm still a little baffled why you don't see much penalty with the data _null_ code when both myself and @Patrick does... Any ideas?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
