I have seen (and used) DOSUBL in code to create a list of variable names in a SAS data step. And it seems as if DOSUBL provides a lot of power that can be used effectively, but I am unable to think of places where it might be used, other than the case where you want that list of variable names. So, where else could you use it? Give me some examples! (I'm not really asking for code, I'm asking for opportunities or uses of DOSUBL. Most likely, I can write my own code)
You probably saw the paper by Rick Langston, published back when he added DOSUBL to the SAS language? A few patterns in there, including an improvement where DOSUBL is more elegant than the RUN_MACRO function for some purposes.
I've seen some impressive (and kind of crazy) examples from SAS-L members with ideas for how to employ DOSUBL. Sometimes these uses border on "abuses" and the results aren't what was expected. That said, I've found it useful as a means to an end for some tasks, like this one.
Thanks, @ChrisHemedinger for your example using PROC IOMOPERATE.
I think I grasp what you are doing there, you get a data set called SPAWNED, then for each row of the data set you run a PROC using DOSUBL, that creates an output data set for each row, then you combine all the output data sets.
Since I am not a SAS administrator, I'm still struggling to think of cases where a statistician such as myself (who has to do lots data manipulation) would want to run a PROC on each row of a data set. Since I have had a whole 10 minutes to think about it, I'm not surprised that I'm coming up blank so far, and it may be that on June 23, 2022, I will have a project that needs this capability and then I'll remember ... um ... what was it I was supposed to remember?
How about an example where you want to calculate the cardinality of a bunch of variables, perhaps from different datasets?
Below I write a macro function that uses PROC SQL to calculate cardinality, and then call the macro in a DATA step. It will run slowly (much slower than Chris's https://blogs.sas.com/content/sasdummy/2013/10/16/about-cardinality/), but still seems helpful that it works.
%macro cardinality(data=,var=);
%local rc emit;
%let rc=%sysfunc(dosubl(%nrstr(
proc sql noprint;
select
count(distinct &var) into :emit
from
&data
quit;
)));
&emit
%mend ;
data testing (keep = libname memname name distinct_count);
set sashelp.vcolumn;
where libname = 'SASHELP' and memname = 'CARS';
distinct_count=resolve('%cardinality('
|| ' data= ' || cats(libname,'.',memname)
|| ',var= ' || name
|| ')'
) ;
put (libname memname name distinct_count)(=) ;
run;
So any case where you want to run a model on multiple datasets (or multiple models) and collect a result (fit statistic or whatever), it might be handy.
But of course you don't need DOSUBL for that. You could use a macro, or maybe just a by-statement, or CALL EXECUTE, or %include.... Probably just about anything that can be solved with DOSUBL could also be solved without DOSUBL. SAS was Turing-complete before DOSUBL. : )
Or use PROC FREQ with NLEVELS option.
My thoughts so far ... suppose I had a data set where I wanted to run some PROC on each row. What would I do without DOSUBL? I would transpose all the rows, then run the PROC on the resulting transposed data set with a BY variable. So far, I haven't found an example like from Chris Hemedinger that DOSUBL adds functionality that I don't already have (or even where DOSUBL makes the programming easier). I get the feeling I'm still missing something, however. Maybe I need to be thinking of non-statistical PROCs, as was the case for Chris.
One place I think it's really handy is when writing function-style macros (like %Varlist, as you mention).
Before DOSUBL, writing function-style macros was often cumbersome because the macro couldn't have any SAS statements in them.
With DOSUBL you can write a function-style macro that executes a DATA step, or multiple DATA steps / PROC steps.
So if you take any function-style macro that in the past I would would have written using %sysfunc to call cumbersome SCL functions to process data or metadata, today I would consider writing that via DOSUBL. %Varlist, %CountObs, %AnyObs, etc.
Rick's original paper (https://support.sas.com/resources/papers/proceedings13/032-2013.pdf) also describes a use-case for PROC STREAM, which I haven't tried.
That said, I too would be interested in more use-case for DOSUBL. It's almost 10 years old now, and you don't see or hear much about it in user group papers etc. It seems quite revolutionary in what it allows.
@rogerjdeangelis you're a DOSUBL fan/advocate. Care to jump in with some of your favorite use-cases?
Don't know if @yabwon or @AllanBowe have found any 'production' uses for it in your work?
Thanks for the callout!
The major problem with `dosubl()` is performance - it gets expensive once you call it a few times. For particular use cases though, it's fantastic!
We used it to build an "SCL Transcoding Kit" - an accelerator for projects looking to modernise legacy AF/SCL applications. You can see it in action here: https://sasapps.io/modernising-legacy-sas-scl-af-applications
Specifically, dosubl() enabled us to handle SLISTs and call modules without breaking out of an SCL method.
Hi Team,
In my case, I'm using DoSubL() a lot in the macroarray package (https://github.com/yabwon/SAS_PACKAGES/blob/main/packages/macroarray.md).
For example the %array() macro uses it in two ways:
1) The syntax for creating an array is basically a data step syntax, which then I embed inside a data step in a %sysfunc(DoSubL(...)) wrapper, so the following:
%array(d x1-x5 (5 4 3 2 1))
%put _user_;%array(d[-2:2] $ ("a" "b" "c" "d" "e"))
creates 5 macro variables:
GLOBAL D1 5 GLOBAL D2 4 GLOBAL D3 3 GLOBAL D4 2 GLOBAL D5 1
And since it puts code inside a data step, when used like that, with a function:
%array(UL[26] $, function = byte(rank("A")+_I_-1) )
%put &=UL1 &=UL2 ... &=UL25 &=UL26;
we get:
UL1=A UL2=B ... UL25=Y UL26=Z
2) The second useful thing is that it helps me to generate macros with dynamic names, more or less like that:
resetline;
%macro GenerateMacro(name);
%local mtext;
%let mtext = _%sysfunc(datetime(),hex16.)_;
%let rc = %sysfunc(
dosubl(
/*===============================================*/
options nonotes nosource %str(;)
DATA _NULL_ %str(;)
CALL SYMPUTX("&mtext.",
'%MACRO ' !! "&name.(n,k);" !!
" data _&name._; " !!
' do i = 1 to &n.; ' !!
' x = ranuni(&k.); ' !!
' output; ' !!
' end; ' !!
' run; ' !!
'%MEND;', 'G') %str(;)
STOP %str(;)
RUN %str(;)
/*===============================================*/
)); &&&mtext. %symdel &mtext. / NOWARN ;
%mend GenerateMacro;
%GenerateMacro(ABCDEF)
%abcdef(3,4321)
%GenerateMacro(UVWXYZ)
%UVWXYZ(17,333)
the result in the log is:
25 %GenerateMacro(ABCDEF) 26 27 %abcdef(3,4321) NOTE: The data set WORK._ABCDEF_ has 3 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 28 29 %GenerateMacro(UVWXYZ) 30 31 %UVWXYZ(17,333) NOTE: The data set WORK._UVWXYZ_ has 17 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
So in the macroarray package when I write:
%array(UL[26] $, function = byte(rank("A")+_I_-1), macarray=Y)
I will not only get a list of 26 macrovariables: UL1 to UL26 but also a macro %UL() will be generated, so I can write:
%put %UL(1) %UL(2) %UL(3) ... %UL(25) %UL(26);
to "mimic" the data step array behaviour.
Also the %deleteMacArray(), %concatArrays(), %appendCell(), and %mcHashTable() macros use this approach.
One more thing, DoSubL() performs poorly in the "do-loop(doSubL())" setup, i.e. DoSubL() is called inside a loop. But it works quite well in reversed setup when you call a loop inside DoSubL() [the "doSubL(do-loop())" setup], so for example something like a "wholesale Metadata Acquisition" can be more convenient to do. Like in this example:
resetline;
option dlcreatedir;
libname test "%sysfunc(pathname(work))/test";
data test.A;
do i = 1 to 17;
output;
end;
run;
data test.B;
do i = 1 to 42;
output;
end;
run;
data test.C;
do i = 1 to 303;
output;
end;
run;
%macro wholesaleMetadataAcquisition(lib);
%local rc;
%let rc = %sysfunc(DoSubL(%str(
options nonotes nosource;
proc sql;
create table _temp_ as
select memname, nobs
from dictionary.tables
where libname = %upcase("&lib.")
;
quit;
data _null_;
set _temp_;
call symputx("nobs_" !! strip(memname), nobs, "G");
run;
proc delete data = _temp_;
run;
)));
%mend;
%wholesaleMetadataAcquisition(test)
%put &=nobs_A. &=nobs_B. &=nobs_C.;
I will scratch my head a bit more later and maybe will find something else.
All the best
Bart
Thanks @yabwon I really like your %array example. It's a nice case where we know that the DATA step knows how to do something really well (and easily), but it would be hard to do it in the macro language. So DOSUBL lets you do it in the macro language (via %sysfunc) but just use the familiar DATA step in the background.
It's like the %varlist example. 20 years ago I saw someone do a presentation of a %varlist macro. And they had done a TON of work to support/expand any sort of valid varlist specification. So you could do %varlist(foo) or %varlist(foo,keep=_numeric_), or %varlist(foo,keep=bar: ). It was a TON of pure macro code to cover all the different kinds of variable lists. But the data step language (and proc language) already knows how to interpret a variable list, so with DOSUBL you can just let it do the work:
%macro VarList(data) ;
%local rc varnames ;
%let rc = %sysfunc(dosubl(%nrstr(
proc contents data=&data out=__ExpandVarList(keep=name) noprint ;
run ;
proc sql noprint ;
select name into :varnames separated by ' '
from __ExpandVarList ;
drop table __ExpandVarList ;
quit ;
)));
&varnames /*return*/
%mend ;
%put %varlist(sashelp.shoes) ;
%put %varlist(sashelp.shoes(keep=R:)) ;
%put %varlist(sashelp.shoes(keep=_character_)) ;
In your last example, %wholsaleMetadataAcquisition, I'm not sure I see the benefit of wrapping the code in DOSUBL.
How is:
%macro wholesaleMetadataAcquisition(lib);
%local rc;
%let rc = %sysfunc(DoSubL(%str(
options nonotes nosource;
proc sql;
create table _temp_ as
select memname, nobs
from dictionary.tables
where libname = %upcase("&lib.")
;
quit;
data _null_;
set _temp_;
call symputx("nobs_" !! strip(memname), nobs, "G");
run;
proc delete data = _temp_;
run;
)));
%mend;
better than:
%macro wholesaleMetadataAcquisition(lib);
proc sql;
create table _temp_ as
select memname, nobs
from dictionary.tables
where libname = %upcase("&lib.")
;
quit;
data _null_;
set _temp_;
call symputx("nobs_" !! strip(memname), nobs, "G");
run;
proc delete data = _temp_;
run;
%mend;
?
%put %wholesaleMetadataAcquisition(test);
With the first above will work 😉
B.
One more I have (unfortunately I don't know from whom I learned this trick) is use DoSubL() to emulate "try/except" programming without affecting SYSCC macrovariable.
filename try TEMP;
data _null_;
file try;
infile cards4;
input;
put _infile_;
cards4;
data x;
x = 17 /* <- missing ; */
y = 42;
run;
;;;;
run;
data _null_;
rc = dosubl('
options nonotes nosource;
filename _D_ DUMMY;
proc printto log = _D_;
run;
%include try;
%let try = &SYSCC.;
');
try = symgetn('try');
put rc= try=;
if try then put 'Your previous code was bad!';
else call execute('%include try;');
run;
%put *&try.*;
%put *&SYSCC.*;
Bart
Great stuff, everyone. Thanks! I am going to think about all of this, and hopefully it will sink in, and maybe even generate some ideas of my own.
Hello Paige,
My "biggest" use case of DOSUBL so far was this: For a project I once implemented the Nelder-Mead method in a DATA step (because I don't have a SAS/OR license). This optimization algorithm requires the evaluation of the objective function in several places. This is no problem as long as the value of the function can be computed in terms of common SAS functions, user-defined functions (PROC FCMP) or other calculations in the same DATA step. However, the function evaluation in that use case involved several DATA and PROC steps (combined in a macro). Using rc=dosubl('%macroname(...)'); these steps were called from within the main DATA step. Macro variables were used to transfer function arguments and returned values between the main DATA step and the "side session" invoked by DOSUBL. Most of the total run time was spent in the side sessions because the amount of data processed in each of the function calls was substantial.
@FreelanceReinh , I'm curious in that setting what made you choose DOSUBL instead of CALL EXECUTE?
I guess the big difference is that the DOSUBL side session can return values to the step that invoked DOSUBL, which CALL EXECUTE cannot do because of the timing issues.
Hi @Quentin,
As far as I remember, I hadn't even considered CALL EXECUTE being an option. Wouldn't the CALL EXECUTE code run only after the main DATA step has finished? But the main DATA step had to "respond" to each function value delivered by a DOSUBL/macro call in the sense that various IF conditions used the returned values. For example, if moving in one direction (in the domain of the mathematical function) increased the function value, this would influence the decision where to evaluate the function in the next iteration in order to get closer to the optimum.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.