BookmarkSubscribeRSS Feed
Quentin
Super User

Over on SAS-L, @yabwon raised an interesting macro issue. https://listserv.uga.edu/scripts/wa-UGA.exe?A2=2205A&L=SAS-L&D=0&P=267343358 We couldn't find an explanation there, so wanted to bounce around the issue here. Perhaps a SAS insider will be tempted to dig into the source code and share an insight that is escaping us.

 

Problem Statement

When a function-style macro is called with text immediately following the macro call [e.g. %mymacro()foo], the word scanner is not able to build a single token combining text returned by the macro with the text immediately after the macro call.

 

Problem Example

Given a simple function-style macro, which returns the word hello:

%macro mymacro();hello%mend mymacro;

It can be called on a %PUT statement:

%put >>%mymacro()foo<< ;

and correctly returns:

3    %put >>%mymacro()foo<< ;
>>hellofoo<<

If called in a DATA statement, it does not work as expected.  This code:

data %mymacro()foo;
run;

Should create one dataset, work.hellofoo.  But in fact it creates two datasets: work.hello and work.foo:

5    data %mymacro()foo;
6    run;

NOTE: The data set WORK.HELLO has 1 observations and 0 variables.
NOTE: The data set WORK.FOO has 1 observations and 0 variables.

Problem Analysis

My suspicion is this is a problem in the word scanner / tokenizer not appropriately building a single token when the beginning of the token is returned by a function-style macro, and the remainder of the token is fixed text. @yabwon's suspicion is the macro processor is returning an unprintable character which is interrupting the word scanner / tokenizer.

 

The tokenizing problem is not specific to the DATA statement.  It also  occurs in assignment statements, and PROC statements:

 

8    data _null_ ;
9      %mymacro()foo=1 ;
NOTE: Line generated by the invoked macro "MYMACRO".
1       hello
        -----
        180
ERROR 180-322: Statement is not valid or it is used out of proper order.

10   run ;

NOTE: The SAS System stopped processing this step because of errors.

11
12   proc print data=%mymacro()foo ;
12   proc print data=%mymacro()foo ;
                               ---
                               22
ERROR 22-322: Syntax error, expecting one of the following: ;, (, BLANKLINE, CONTENTS, DATA,
              DOUBLE, GRANDTOTAL_LABEL, GRANDTOT_LABEL, GRAND_LABEL, GTOTAL_LABEL, GTOT_LABEL,
              HEADING, LABEL, N, NOOBS, NOSUMLABEL, OBS, ROUND, ROWS, SPLIT, STYLE, SUMLABEL,
              UNIFORM, WIDTH.

12   proc print data=%mymacro()foo ;
                               ---
                               202
ERROR 202-322: The option or parameter is not recognized and will be ignored.

13   run ;

The problem is specific to function-style macros.  It does not effect true macro functions.  Below shows that %LOWCASE, which is a function-style autocall macro provided by SAS shows the problem; %UPCASE which is a true macro function does not:

15   data %lowcase(abc)foo;
16   run;

NOTE: The data set WORK.ABC has 1 observations and 0 variables.
NOTE: The data set WORK.FOO has 1 observations and 0 variables.

17   data %upcase(abc)foo;
18   run;

NOTE: The data set WORK.ABCFOO has 1 observations and 0 variables.

 

The problem does not occur for pure macro code:

34   %let %mymacro()foo=1 ;
35   %put &=hellofoo ;
HELLOFOO=1

 

Work arounds

As a workaround, the macro call can be placed inside %UNQUOTE(), or any macro function which would unquote a value:

20   data %unquote(%mymacro())foo ;
21   run ;

NOTE: The data set WORK.HELLOFOO has 1 observations and 0 variables.

22
23   data %upcase(%mymacro())foo ;
24   run ;

NOTE: The data set WORK.HELLOFOO has 1 observations and 0 variables.

25
26   data %sysfunc(trim(%mymacro()))foo ;
27   run ;

NOTE: The data set WORK.HELLOFOO has 1 observations and 0 variables.

Using name-literals also avoids the problem:

29   data "%mymacro()foo"n ;
30     "%mymacro()foo"n=1 ;
31     put hellofoo= ;
32   run ;

hellofoo=1
NOTE: The data set WORK.HELLOFOO has 1 observations and 1 variables.

Question

Clearly there are ways to avoid this problem, or work around it when it occurs.  But I'm interested in people's thoughts on the root cause of the issue. I'm curious whether this is a bug in the word scanner/tokenizer, or a bug in the macro processor.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
12 REPLIES 12
yabwon
Onyx | Level 15

Hi @Quentin !

 

Thanks a million for bringing the topic up here to Communities. We have here a pack of very smart SAS people and I'm sure someone will be able to help.

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



ballardw
Super User

I suspect it has to do with the macro parameter list.

And maybe for such a trivial use the macro definition should be

%macro mymacro(a);hello&a %mend mymacro;
data %mymacro(foo);
run;

which when used as &mymacro() alone still returns "hello".

Quentin
Super User

Thanks @ballardw, can you say more about your suspicion of the macro parameter list?

 

Here's an interesting wrinkle contributed by @RichardDeVen in the SAS-L thread.

 

If you add %STR() around the value returned by the macro, the behavior changes.  This makes some sense, since we know %STR() adds unprintable masking characters (even when there are no values to quote), and those unprintable characters are known to cause problems for the tokenizer on occasion if they are not automatically unquoted.

 

Below example uses prefix%mymacro()suffix in attempt to create a single token: prefixhellosuffix.  Without %STR(), I get two tokens: prefixhello and suffix.  If I add %str(), I get two different tokens: prefix and hellosuffix:

 

1    %macro mymacro();hello%mend mymacro;
2    %macro mymacrostr();%str(hello)%mend mymacrostr;
3
4    data prefix%mymacro()suffix ;
5    run ;

NOTE: The data set WORK.PREFIXHELLO has 1 observations and 0 variables.
NOTE: The data set WORK.SUFFIX has 1 observations and 0 variables.

6
7    data prefix%mymacrostr()suffix ;
8    run ;

NOTE: The data set WORK.PREFIX has 1 observations and 0 variables.
NOTE: The data set WORK.HELLOSUFFIX has 1 observations and 0 variables.

So in both cases the tokenization is broken, but the introduction of masking characters by %STR() somehow changes *where* it breaks.

 

Yes, the macro in this case is intentionally trivial so as to focus on the illustration of the language feature. By the 'rules' of the macro language and the SAS language, I think the above should work. Understanding the root cause of why it doesn't work may shed some light on the inner workings of the macro language or tokenizer.

 

 

 

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
yabwon
Onyx | Level 15

Hi @ballardw ,

 

Thanks for your input! 

 

The example here is just a "boiled down" version of the original macros code I've been working on.

Here is the link to the description of the macros:

1) https://github.com/yabwon/SAS_PACKAGES/blob/main/packages/baseplus.md#ldsn-macro

2) https://github.com/yabwon/SAS_PACKAGES/blob/main/packages/baseplus.md#ldsnm-macro

Unfortunately the workaround you proposed won't fit here.

 

The origin of finding that "bug'ish" behaviour was the following:

 

Lets assume we have a macro which takes some text as an input parameter, process it, and returns other text string in such way that output string can be use as a dataset name, e.g.

%macro myMacro(text);
_%sysfunc(MD5(&text.), hex16.)
%mend myMacro;

The use case would be:

data %myMacro(a b c d e);
  set sashelp.class;
  xyz = height*weight;
run;

and as a result:

NOTE: The data set WORK._2438B89B82C22B3D has 19 observations and 6 variables.

dataset was created.

 

The next idea, following this one, was: let us try to do the following:

data 
  %myMacro(a b c d e)_F 
  %myMacro(a b c d e)_M
;
  set sashelp.class;
  xyz = height*weight;
  if sex = "F" then output %myMacro(a b c d e)_F;
               else output %myMacro(a b c d e)_M;
run;

data combined;
  set %myMacro(a b c d e)_:;
run;

But unfortunately the problem "pop-upped":

1    data
2      %myMacro(a b c d e)_F
3      %myMacro(a b c d e)_M
4    ;
5      set sashelp.class;
6      xyz = height*weight;
7      if sex = "F" then output %myMacro(a b c d e)_F;
8                   else output %myMacro(a b c d e)_M;
9    run;

ERROR: Data set WORK._2438B89B82C22B3D is already open for output.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK._2438B89B82C22B3D may be incomplete.  When this step was stopped
         there were 0 observations and 6 variables.
WARNING: Data set WORK._2438B89B82C22B3D was not replaced because this step was stopped.
WARNING: The data set WORK._F may be incomplete.  When this step was stopped there were 0
         observations and 6 variables.
WARNING: Data set WORK._F was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


10
11   data combined;
12     set %myMacro(a b c d e)_:;
13   run;

NOTE: There were 19 observations read from the data set WORK._2438B89B82C22B3D.
NOTE: There were 19 observations read from the data set WORK._2438B89B82C22B3D.
NOTE: There were 0 observations read from the data set WORK._F.
NOTE: The data set WORK.COMBINED has 38 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

So... yeah, that's it, we're cooked. 😉

 

 

All the best

Bart

 

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

This is not a new issue, although this is the first example of how to trigger it while running mainly in open code.  Normally you will see this happening inside a macro when trying to use complex macro expression to generate a single SAS token.

 

The work arounds are the same in both cases.

1) Build the token first and then use it.

%let dsname=%mymacro()foo;
data &dsname;

2) Use macro function like %UNQUOTE() to re-group the generated token into one token that is passed onto SAS to evaluate.

data %unquote(%mymacro()foo);
yabwon
Onyx | Level 15

Hi @Tom ,

 

Thanks for the input!

We played with those workarounds too but (though they do the job and I am using them) frankly I consider them not "elegant". I would expect macros to give the user the same "freedom" of use as macrovariables does, i.e.

%let mv = _%sysfunc(MD5(a b c d e), hex16.);
data 
  &mv._F 
  &mv._M
;
  set sashelp.class;
  xyz = height*weight;
  if sex = "F" then output &mv._F;
               else output &mv._M;
run;

but extended by a "function form" of macros calls to generate the code:

resetline;
data 
  %myMacro(my first output for F) 
  %myMacro(my second output for M)
;
  set sashelp.class;
  xyz = height*weight;
  if sex = "F" then output %myMacro(my first output for F);
               else output %myMacro(my second output for M);
run;

 

All the best

Bart

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

Beauty is in the eye of the beholder.

 

I am not an expert on building compilers but I am sure the compiler for the SAS language, especially when it has to accommodate the macro preprocessor, is extremely complex.  That is a side effect of SAS being so flexible (you don't have to define your variables before referencing using them for example).  

 

I doubt at this point SAS is going to try to nail down this bug. 

 

Perhaps there are even user jobs that are working as intended with code like: 

data %mymacro()foo;

because the programmer just forgot to insert the space between the two names.  Those programs would now break if this behavior was changed.

yabwon
Onyx | Level 15

"Beauty is in the eye of the beholder." - true, agree 100% 🙂

 

"I doubt at this point SAS is going to try to nail down this bug. " - that's also my doubt ☹️ 


About programs "running on this bug" I think that if it would be documented (i.e. "it works like that, period") there would be no discussion. 

 

I heard once a "story from the office", where some folks did an advance solution (using a "feature" of similar sort) it worked for quite some time, then SAS "fixed" the "feature" and they had to rewrite everything back, so maybe there is a hope here. 😁

 

B.

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Quentin
Super User

Thanks for the response @Tom .

 

Indeed, I'm aware this is an old issue.  When Bart first raised it, my mind went to memories of Master Ian Whitlock talking about using %unquote to glue tokens back together, as mentioned in this paper https://support.sas.com/resources/papers/proceedings/proceedings/sugi28/011-28.pdf and this SASL post https://listserv.uga.edu/scripts/wa-UGA.exe?A2=ind0512A&L=SAS-L&P=R40041&X=OD4E9A946EB9C70510B which includes the lines:

 

 

The old proverb, "If it looks good but doesn't work apply %UNQUOTE" is still worth while 
because there are times when the macro facility leaves a token in pieces and %UNQUOTE can glue those pieces back together.

 

My main interest in the question was as to root cause (is this a bug in the tokenizer, or is the macro processor actually adding unprintable characters to the input stack when the macro function returns a value).  It sounds like you agree it's likely a tokenizer bug.  I would be much more concerned by the idea that the macro processor might be somehow adding hidden characters.

 

I have never written a compiler.  As you say, I imagine it must a complex undertaking, especially when dealing with the macro language preprocessor, and other code generation methods. @AllanBowe  mentioned that WPS shows this same 'bug.' So perhaps that is evidence that this small problem is inherent to the logic necessary to make a SAS tokenizer work.

 

I can accept that %unquote() can help the tokenizer (even when there are no quoting characters involved), and also that using the name literal can help the tokenizer, and clearly using an 'extra' macro variable to store the full token would also make life much easier for the tokenizer.

 

I'm still perplexed as to how the introduction of quoting characters by %STR() can actually help the setting where the goal is to create a token from a macro call and a suffix.  I suppose it could be an unintentional quirk of the actual quoting char inserted by %STR() at the end of a string.  Interestingly %BQUOTE causes different splitting of the token than %STR():

 

 

2    %macro mymacro();hello%mend mymacro;
3    %macro mymacrostr();%str(hello)%mend mymacrostr;
4    %macro mymacrobquote();%bquote(hello)%mend mymacrobquote;
5
6    data prefix%mymacro()suffix ;
7    run ;

NOTE: The data set WORK.PREFIXHELLO has 1 observations and 0 variables.
NOTE: The data set WORK.SUFFIX has 1 observations and 0 variables.

8
9    data prefix%mymacrostr()suffix ;
10   run ;

NOTE: The data set WORK.PREFIX has 1 observations and 0 variables.
NOTE: The data set WORK.HELLOSUFFIX has 1 observations and 0 variables.

11
12   data prefix%mymacrobquote()suffix ;
13   run ;

NOTE: The data set WORK.PREFIX has 1 observations and 0 variables.
NOTE: The data set WORK.HELLO has 1 observations and 0 variables.
NOTE: The data set WORK.SUFFIX has 1 observations and 0 variables.

 

Also agree that this issue is unlikely to be fixed.  But was hoping there is a small chance that maybe some macro insider like @russt_sas might be inspired to take a peak into the tokenizer code and see if it looks like a bug, as he did in this thread about a macro resolution bug: https://communities.sas.com/t5/SAS-Programming/Forward-rescan-rule-of-macro-variable/m-p/373750#M274....

 

 

 

 

 

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
ChrisNZ
Tourmaline | Level 20

Good find!

 

 I'm curious whether this is a bug in the word scanner/tokenizer, or a bug in the macro processor.

By now, it's neither. There's over 40 years of macro code lying around, and the behaviour cannot be changed without the risk of breaking some of that code. So by virtue of age and of SAS's policy not to break old code*, it's now a feature.

 

* Exceptions can exist, such as proc export's behaviour changed in 9.4M6, but they are rare and concern recent  additions afaik.

yabwon
Onyx | Level 15

Hi @ChrisNZ 

 

Agree that SAS may don't want to "fix" this "bug". In that case I would expect an update in the documentation to describe and explain the behavior. 

 

But since it can be consider a "feature" - it can be also altered with options, I believe. The same way as the MINOPERATOR was added to the macro language or SECURE option too, right? Something like:

%macro myMacro(...) / FORCEUNQUOTE;
 ...
%mend;

😉

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Quentin
Super User

 


@ChrisNZ wrote:

Good find!

 

 I'm curious whether this is a bug in the word scanner/tokenizer, or a bug in the macro processor.

By now, it's neither. There's over 40 years of macro code lying around, and the behaviour cannot be changed without the risk of breaking some of that code. So by virtue of age and of SAS's policy not to break old code*, it's now a feature.

 

* Exceptions can exist, such as proc export's behaviour changed in 9.4M6, but they are rare and concern recent  additions afaik.


I'm generally sympathetic to (and appreciative of) SAS's commitment to backwards compatibility.  That said, as you say, they do sometimes make breaking bug fixes.  I've been bit by changes in the structure of tables written by ODS OUTPUT statements, and once by a nasty change in the SQL interpretation of a poorly written ambiguous sub-query (after the change the query returned different results, there were no errors/warnings before or after the change).

 

In this case, if a correction were possible, I think it would be fair to make the change.  I think it's a safe bet that a program that relies on the current tokenization bug would throw an error about non-existing datasets, which would be easy to debug. 

 

But I doubt there will be interest at SAS in fixing the bug, regardless.  And I imagine the word scanner is a complex 'object,' so I'm sure making changes to it would come with the cost of executing lots and lots of regression testing.  So probably won't be worth it.

 

I'm shocked I haven't been bitten by this issue more often.  I sometimes use Peter Crawfords %now() macro function to provide a suffix for a dataset name: out.mydata_%now().  I guess it's just dumb luck that the bug effects using a macro function to create a prefix but does not effect using a macro function to create a suffix.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1244 views
  • 11 likes
  • 5 in conversation