BookmarkSubscribeRSS Feed
PaigeMiller
Diamond | Level 26

Every morning when I wake up, I ask myself "How can I use DOSUBL today?" Okay ... not every morning ... But yesterday when I got to work, a colleague asked me if I could help him solve a problem. He has some code that joins several SAS data sets, and sometimes one variable doesn't exist in a data set, and then the join fails. I know the best solution is to figure out why this happens and then prevent it from happening further ... but it wasn't really my problem. My solution would be to add this missing numeric variable to a data set IF it is not already present, so that the next join would work. I know I could write a macro solution to this, but I decided to try a DOSUBL solution. This post isn't really a question ... but rather the result of this learning experience where I write some DOSUBL code that works in SQL, and maybe this solution is overcomplicated (as I said, I could have simply written a small macro without DOSUBL), but any suggestions are welcome.

 

Here is what I came up with.

 

%macro varexist(data=_LAST_,varname=,alias=);
    /* Use in SQL to add a column named &varname (which is numeric) if it doesn't already exist in the data set */
    /* If it does exist in the data set, just extract all existing variables */
    %if %upcase(%superq(data)) = _LAST_ %then %let data = &SYSLAST;
    %if &alias= %then %let word=_name_; 
    %else %let word=cats("&alias..",_name_); 
    %let rc = 
        %sysfunc(dosubl(%str(
            proc transpose data=&DATA(obs=0) out=ExpandVarList_temp;
                var _all_;
            run;
            proc sql noprint;
                select &word
                    into :temp_varnames separated by "," from ExpandVarList_temp ;
                drop table ExpandVarList_temp;
            quit; 
        )));
    %let varexist = %sysfunc(findw(%qupcase(&temp_varnames),%upcase(&varname),%str(,.)));
    %put &=varexist;
    %if not &varexist %then %let temp_varnames = %unquote(&temp_varnames%str(,. as )&varname);
    &temp_varnames
%mend varexist; 

Usage:

proc sql;
    create table want as select %varexist(data=sashelp.class,varname=height)
    from sashelp.class;
quit;

Usage 2:

proc sql;
    create table want as select %varexist(data=sashelp.class,varname=bloodpressure)
    from sashelp.class;
quit;

Usage 3 (in a join)

data iq;
    length name $ 8;
    name='Jane';
    iq=150;
    output;
    name='Janet';
    iq=144;
    output;
run;
proc sql;
    create table want as select 
        %varexist(data=sashelp.class,varname=bloodpressure,alias=a)
        ,b.iq
    from sashelp.class as a left join iq as b 
    on a.name=b.name;
quit; 

 

--
Paige Miller
14 REPLIES 14
Quentin
Super User

Hi @PaigeMiller,

 

I like it.  I think DOSUBL is probably severely under-used.  It brings new possibilities to function-style macros, like this.  I do have some thoughts to share for the sake of discussion.

 

1.  As a general rule, I try to declare all macro variables that are intended to be "internal" to a macro as %local so as to avoid naming collisions.  So I would add:

%local word rc temp_varnames varexist ;

Note while this is usually just protection against the possibility that a macro variable exists in an outer scope, this is particularly important when using DOSUBL.  If you run the macro as is, you will see that it creates temp_varnames as a global macro variable.  This is a result of an unfortunate (IMHO) DOSUBL design decision.  The macro variable temp_varnames is created as a global macro variable in the side session.  When it is returned to the main session if the macro variable does not already exist as a local variable, it is created as a global variable.  I think it would have been better for DOSUBL to create it as a local macro variable (assuming it did not already exist as a global macro variable).  Creating the variable as %local before calling DOSUBL allows DOSUBL to return the the macro variable to the local symbol table.

 

2.  I think as a general practice it's better to use 

%sysfunc(dosubl(%nrstr(

as a wrapper for the DOSUBL block rather than

%sysfunc(dosubl(%str(

Using %NRSTR() delays resolution of macro references until the code executes in the DOSUBL side-session.  It's important to use %NRSTR() when the side-session code will use SAS code to create new macro variables that are referenced inside the DOSUBL block.  The same way that it's recommended to use %NRSTR() inside CALL EXECUTE to delay resolution of macro variables.  It's also important to use %NRSTR() if the code in the DOSUBL block has any macro statements, because you want them to execute in the side session.  In this case, using %STR() works fine.  But since %NRSTR() always works, I think it's a better choice.

 

3. I feel like %VarExist is an odd name for this macro.  I'm not sure what I would name this macro.  It's almost %ListVariablesAndAddANewVariableIfItDoesNotExist.  If I see a macro named %VarExist, I think it will return 1/0 as to whether or not a variable exists.

 

4. Thinking about %VarExist made me wander down a different design path, still using DOSUBL.  Suppose you take the core of your macro and turn it into a macro %VarList, which returns a list of variables in a dataset:

%macro varlist(data,dlm=%str( ));
    %local rc varlist ;
    %let rc = 
        %sysfunc(dosubl(%nrstr(
            proc contents data=&DATA out=ExpandVarList_temp(keep=name) noprint;
            run;
            proc sql noprint;
                select name
                    into :varlist separated by "&dlm" from ExpandVarList_temp ;
                drop table ExpandVarList_temp;
            quit; 
        )));
    &varlist
%mend varlist; 

I think that is a really useful utility macro.  You can do stuff like:

%put %varlist(data=sashelp.class) ;
%put %varlist(data=sashelp.class(keep=_numeric_)) ;
%put %varlist(data=sashelp.class(keep=name--age)) ;

That macro makes it easy to write a %VarExist macro which will tell you if a variable exists or not, e.g.:

 

%macro varexist(data=,var=);
  %*function-style macro, return 1 if variable exists, else 0 ;
  %eval(%sysfunc(findw(%qupcase(%varlist(&data,dlm=|)),%upcase(&var),|))>0)
%mend varexist; 

 

With a %VarExist macro utility like that, I might stop there, and use an open %IF statement in my PROC SQL, e.g.:

 

proc sql;
    create table want as select *
      %if NOT %varexist(data=sashelp.class,var=bloodpressure) %then %do ;
        , . as bloodpressure
      %end ;
    from sashelp.class;
quit;

 

For the join:

data iq;
    length name $ 8;
    name='Jane';
    iq=150;
    output;
    name='Janet';
    iq=144;
    output;
run;

proc sql;
    create table want as select 
      a.*
      %if NOT %varexist(data=sashelp.class,var=bloodpressure) %then %do ;
        , . as bloodpressure
      %end ;
      ,b.iq
    from sashelp.class as a left join iq as b 
    on a.name=b.name;
quit;

Because %varlist returns a list of variables, you can use other list-processing macros on the list.  For example, suppose you want to do the join, but want to keep only the character variables from sashelp.class.  You can use  Richard DeVenezia's %seplist (https://www.devenezia.com/downloads/sas/macros/index.php?m=seplist) to add the a. alias to the list of character variables, e.g.:

proc sql;
    create table want as select 
      %seplist(%varlist(sashelp.class(keep=_character_)),prefix=a.,dlm=%str(,))
      %if NOT %varexist(data=sashelp.class,var=bloodpressure) %then %do ;
        , . as bloodpressure
      %end ;
      ,b.iq
    from sashelp.class as a left join iq as b 
    on a.name=b.name;
quit;
BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
PaigeMiller
Diamond | Level 26

All great points, @Quentin .Thanks. As this was my first attempt to use DOSUBL, I can see that there's a lot more to think about.

--
Paige Miller
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @PaigeMiller 

 

"Every morning when I wake up, I ask myself "How can I use DOSUBL today?" - One should ask oneself questions like that whenever a new release of SAS brings new possibilities, but in a busy worklife, one tends to stick with old habits, and I am still trying to get the Hash Object into my personal tool box. I made a few experiments only with dosubl, so I am not qualified to discuss the details in using dosubl, but you really got me interested in learning more!

 

I have some comments on the technique you use to solve your collegaue's problem. It looks like overkill, but what you do is necessary some way or other, so I gave it a try to see if I could come up with a solution that seems simpler, and to get it to work with dusubl in the same way as your code (part of my learning).

 

The idea is to use dosubl to create a view with the missing numeric variable included if necessary instead of generating a variable select list, and - in following steps - to move the macro call from the variable list to the set/from definition, so the data is read through the created view. The macro returns the view name, which in my example is a text constant, because the same view name is reused in every call to the macro.

 

Don't let my writing the whole dosubl code in one line bother you. It did bother me, but I could'nt get the text coloration in Display Manager right if the included code is split in a line per statement as in your example. As is, my code leaves the last created view in SAS WORK. Not nice either, but as it is a view only and don't use space, I think it's a minor problem.

 

* macro to create view with a missing numeric variable added if necessary;
%macro thisview(table=,checkvar=);
  %let rc = %sysfunc(dosubl(%nrstr('data thisview / view=thisview; set &table; if missing(vnamex("&checkvar")) then &checkvar = .; run;')));
  thisview
%mend;

*Usage 1 (variable present):;
data test1; set %thisview(table=sashelp.class, checkvar=height);
run; 

*Usage 2 (variable missing):;
data test2; set %thisview(table=sashelp.class, checkvar=bloodpressure);
run; 

*Usage 3 (in a join);
data iq;
    length name $ 8;
    name='Jane'; iq=150; output;
    name='Janet'; iq=144; output;
run;

proc sql;
    create table test3 as select 
        a.*
        ,b.iq
    from %thisview(table=sashelp.class, checkvar=bloodpressure) as a left join iq as b 
    on a.name=b.name;
quit; 

 

PaigeMiller
Diamond | Level 26

I too am slowly trying to get Hash into my toolbox.

 

Thank you for the suggestions, should I ever need to use DOSUBL again, I will keeep this in mind.

 

So thanks @Quentin and @ErikLund_Jensen for having a discussion with me on thus rather arcane but promising topic.

--
Paige Miller
Quentin
Super User

Very nice, @ErikLund_Jensen .  Your macro %thisview reminds me of Mike Rhoads's macro %GetSQL in the original macro-function-sandwich paper (precursor to DOSUBL).  That macro also returned a view which could be used inline, like:

data example;
     set %GetSQL
       (select name, age, weight from sashelp.class where sex = 'M' order by age)
         %GetSQL
       (select name, age, height from sashelp.class where sex = 'F' order by name)
     ;
run;

That paper was also the original source of %ExpandVarList (using the transpose method shared by @data_null__ on SAS-L).

 

Just for the sake of future readers of this thread, my recommended readings on DOSUBL would be:

 

1. Mike Rhoads's MFS paper, now (gasp) 10 years old: https://support.sas.com/resources/papers/proceedings12/004-2012.pdf .  It doesn't use DOSUBL, because it didn't exist at the time, but it introduces the idea of a function-style macro that can execute SAS code.  I still remember where I was when I read this paper the first time; it was like Mike had invented magic.  I couldn't believe it.

 

2. Rick Langston's paper where he introduced DOSUBL to the world: https://support.sas.com/resources/papers/proceedings13/032-2013.pdf. It ends withs a DOSUBL version of %ExpandVarList.

 

3. John King's paper presenting a DOSUBL version of ExpandVarList that has enhanced features for selecting the variables to be returned. https://www.mwsug.org/proceedings/2017/BB/MWSUG-2017-BB142.pdf

 

4. For those interested in how DOSUBL manages macro variable scopes, I tried to deduce the rules and present my understanding of them in this paper: https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4958-2020.pdf

 

I hope to see more papers exploring and documenting DOSUBL.  There is much to be explored.  For example, it's important to know which system options defined in the main session are inherited by the side-session, and which system options changed in the side-session will cause an update to the main session options.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Quentin
Super User

As evidence of the need for more DOSUBL papers / blog posts, I tried asking ChatGPT about DOSUBL, and it seems to have it confused with the %include statement.  : )

 

/* Main SAS program */

DOSUBL 'C:\Program Files\SAS\subroutine.sas';

/* Rest of main SAS program */

Asking it more questions led to humorous explanations, e.g.:

 

In this program, the SYMBOLGEN option is enabled, which creates a symbol table containing the values of the macro variables var1 and var2. The subroutine program can then access the values of these variables using the SYMGET function, as shown in the following example...

I guess there's a reason that stack overflow banned ChatGPT-generated answers : )

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
PeterClemmensen
Tourmaline | Level 20

@Quentin, I'm not sure where, but I seem to remember reading that the Dosubl Function is a Behind The Scenes %Include Statement. Have your read that too or is the ChatGPT far off? 🙂

yabwon
Onyx | Level 15

Rick is saying about this here: 

https://blogs.sas.com/content/sasdummy/2018/12/20/five-sas-programming-language-features/

DoSubL discussion starts around 5:35

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

Quentin,

 

and here is the `SQLinDS` package: https://github.com/SASPAC/sqlinds

which is an implementation of Mike's paper and code with some additional extensions. Documentation is here: https://github.com/SASPAC/sqlinds/blob/main/sqlinds.md

 

Basic example:

data class;
  set %SQL(select * from sashelp.class order by age);
run;

prints the following in the log:

1      data class;
2          set %SQL(select * from sashelp.class order by age);
NOTE:*** the query ***
     "select * from sashelp.class order by age"
     *****************
*** executed as ***
>        select CLASS.Name, CLASS.Sex, CLASS.Age, CLASS.Height,
>CLASS.Weight
>          from SASHELP.CLASS
>      order by CLASS.Age asc;
>
*****************
3        run;

All the best

Bart

 

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

This looks more reasonable.  But it is simpler to just use LENGTH instead of an assignment statement to force the existence of the variable.  That also means it can support multiple required variables in a single call.

%macro thisview(table=,required=);
%local rc;
%let rc = %sysfunc(dosubl(%nrstr('data thisview / view=thisview; set &table; length &required 8; run;')));
thisview
%mend;

As to the original problem of testing for the existence of a variable in a dataset there is no need to use DOSUBL as you can just use the VARNUM function.  For example see this 24 year old macro: %varexist() 

Quentin
Super User

As to the original problem of testing for the existence of a variable in a dataset there is no need to use DOSUBL as you can just use the VARNUM function.  For example see this 24 year old macro: %varexist() 

Indeed, but I've got a similar SCL-function based version of %Varlist, and it's a hideous monster, particularly to support keep/drop.  The DOSUBL version is actually readable (while of course executing MUCH more slowly, because of overhead of DOSUBL is substantial).

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Tom
Super User Tom
Super User

@Quentin wrote:

As to the original problem of testing for the existence of a variable in a dataset there is no need to use DOSUBL as you can just use the VARNUM function.  For example see this 24 year old macro: %varexist() 

Indeed, but I've got a similar SCL-function based version of %Varlist, and it's a hideous monster, particularly to support keep/drop.  The DOSUBL version is actually readable (while of course executing MUCH more slowly, because of overhead of DOSUBL is substantial).


In general yes. 

Not in this case however.  The pure SAS code you posted to test if the variable exists is 9 lines of code. And then you added all of the extra macro logic and DOSUBL() logic on top of that.  The macro code in the VAREXIST macro is 9 lines of macro code, and it adds the functionality of returning the result of VARTYPE() or  other VARxxx() functions if you want.

 

The SAS code you posted is using PROC CONTENT and PROC SQL.  So to determine if a variable exists you get the whole list of variables, then have to try to test if the name is in the list.

 

The SCL functions (as you called them) is a better match to the actual problem.

Essentially the macro logic is:

Does the dataset exist? open() function call.

Does the variable exist?  varnum() function call.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Tom 

 

I did think of that, but dropped it, because

 

  • I always try to avoid the "NOTE: Variable <varname> is uninitialized." if it is foreseeable. This note often originates from missing variables in an input table, and in a production setup, it can be caught in an automated log scan and used for data quality control, 
  • It has become a habit of mine never to reassign a length to a variable that exists in the PDV, because this in many situations throw the "WARNING: Multiple lengths were specified for the variable ...". It will not in this case, but old habits...

I got it to work with dosubl, because that is the topic of this thread, and it was a fun to play with. But for my own use, I wouldn't have written a function-like macro, I would have omitted the return value "thisview" from the macro and placed the macro call just before the step where it was needed, so the view was created and could be referred by name in a SQL from-clause or SET statement later.

ChrisNZ
Tourmaline | Level 20

I just stumbled upon this nice idea.
The dosubl call can be shortened to

    %let rc=%sysfunc(dosubl(%nrstr(
       proc contents data=&data. out=___VAREXISTS___(where=(NAME=upcase("&varname"))) noprint; run;
       %let varexist = &sysnobs;
       proc delete data=___VAREXISTS___; run;
    )));   

or even

%let rc=%sysfunc(dosubl(%nrstr(
  %let dsid     = %sysfunc(open(&data.));
  %let varexist = %sysfunc(varnum(&dsid.,&varname.));
  %let dsid     = %sysfunc(close(&dsid.));
)));

but then you don't need dosubl for this code.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 5709 views
  • 16 likes
  • 7 in conversation