11-09-2017 01:52 PM
It's starting to bother me that %SYSFUNC can't pass a null argument to a function. Does it bother anyone else?
I most often encounter this limitation where I have a macro variable list, and use %SYSFUNC(countw(&list)) to count the number of items in the list, and I want a null list to return 0.
Unfortunately, %sysfunc(countw()) errors:
89 %let list=; 90 %put %sysfunc(countw(&list)); ERROR: The function COUNTW referenced by the %SYSFUNC or %QSYSFUNC macro function has too few arguments.
In the past I have thought of this a COUNTW limitation, so my workaround was to always pass the delimiter as an argument, which works fine e.g.:
91 %let list=; 92 %put %sysfunc(countw(&list,%str( ))); 0
But I thought about it today, an realized it's a limitation of %SYSFUNC. Of course COUNTW knows how to handle nulls:
93 data _null_; 94 count=countw(""); 95 put count=; 96 run; count=0
%SYSFUNC just can't pass a null value to a function, even if the function accepts null arguments. So all of the following fail, even though the functions accept null arguments in a DATA step:
%put %sysfunc(countw()); %put %sysfunc(upcase()); %put %sysfunc(length()); %put %sysfunc(lengthn()); %put %sysfunc(propcase());
They work if I pass a "macro null", i.e. :
%put %sysfunc(countw(%str())); %put %sysfunc(upcase(%str())); %put %sysfunc(length(%str())); %put %sysfunc(lengthn(%str())); %put %sysfunc(propcase(%str()));
But I don't want to do that.
Does this seem like an important limitation of %SYSFUNC?
There is a sentence in the %sysfunc documentation that I think is trying to explain this, but I don't think it really clarifies things much: "an empty argument position will not generate a NULL argument, but a zero length argument." In my head, it's not obvious how they are differentiating between a null argument and a zero length argument, but this is the only thing in the docs I can find which I think means "null arguments don't work."
11-09-2017 03:04 PM
I don't see this as being a problem. You're passing double quotes in your data step version, which is basically doing the same thing as your %str( ) in the macro code. If you run the data step with no arguments, you get the same error. You can always assign your macro variables through null data steps as well.
11-09-2017 03:50 PM
In the SAS language I'm not passing double quotes to COUNTW, I'm passing a null string (a string of length 0).
In the macro language you can represent a null string without using %str().
112 %let null=; 113 %put Hi&null.mom; Himom
There are lots of macro functions that work fine with a null argument, e.g.:
118 %put Lowcase macro function works with a null argument >%lowcase()<; Lowcase macro function works with a null argument >< 119 %put The length of a null argument is %length(); The length of a null argument is 0
It's just %SYSFUNC() which has troubles with it, because it calls a DATA step function and won't pass it a null. In fact, it's not really that a null argument is the problem, because the below "works", even though both parameters passed to countw are null:
120 %put %sysfunc(countw(,)); 0
So I think it's just that %sysfunc really wants to see *something* inside the parentheses, so that it has something to send the function, even if all it is sending is a comma that indicates two null parameters.
Why shouldn't it be smart enough to send a null to the function?
11-09-2017 04:04 PM
One work around would be to use %length function to test the parameter and only execute if the length is > 0.
I'm not sure why you expect a macro function that calls a data step function to have the function "work" when the same call in a data step is an error. Using some of your example above I wrote this little data step. Guess how many errors and of what type it generates before running. I think %sysfunc is following the rule of the datastep function (or at least semi-gracefully handling a suboptimal programmers choice).
data junk; length x $27.;
a= countw(); b=upcase(); c=length(); d=lengthn(); e=propcase(); run;
11-09-2017 04:17 PM
The macro language is different than the DATA step language. They do not work the same way.
For example, in the data step language you need double quote marks to indicate a string, they are not needed in the macro language.
In the data step language two quote marks, i.e. "" are used to indicate a null string. In the macro language, two quote marks are a string with length two.
121 %let twoquotes=""; 122 %put %length(&twoquotes); 2 123 %put %sysfunc(length(&twoquotes)); 2
Given that %SYSFUNC is a macro language statement, I think it should accept a macro language null, i.e. a string of 0 length.
Note by your logic the following calls which succeed for %sysfunc should actually fail:
150 %put %sysfunc(countw(,)); 0 151 %put %sysfunc(countw(,%str( ))); 0
Because the equivalent in a DATA step fails:
152 data want; 153 x=countw(,); 154 y=countw(," "); 155 put x= y=; 156 run; x=0 y=0 NOTE: The data set WORK.WANT has 1 observations and 2 variables.
WHOA, I'm shocked that those calls in the data step didn't fail. Is that surprising to anyone else?
11-09-2017 10:01 PM
As a related aside, in the DATA step setting, when there is a function with one required argument and one or more optional arguments, if you pass no arguments, it will error:
64 data _null_; 65 x=compress(); *error; -------- 71 ERROR 71-185: The COMPRESS function call does not have enough arguments. 66 run
Fair enough. There was no argument passed to the function.
But if you put in a comma, it works (I think), at least no error:
68 data _null_; 69 x=compress(,); *no error; 70 run;
How would that be explained? I would say this second example also passes no arguments to the function. The comma is not an argument. I can't come up with a reasonable explanation for why the above would not throw an error.
If the explanation were something like "well, there is a comma there, so second example is passing two null arguments to the function", then I would think the first example should be passing one null argument to the function, and should not throw an error.
11-09-2017 04:41 PM
You cannot call the COUNTW() function in a datastep without an argument either.
244 %let null=; 245 %put %sysfunc(countw(&null)); ERROR: The function COUNTW referenced by the %SYSFUNC or %QSYSFUNC macro function has too few arguments. . 246 data _null_; 247 x=countw(&null); ------ 71 ERROR 71-185: The COUNTW function call does not have enough arguments. 248 run;
The %LENGTH() macro function works because it is a macro function, not a data step function, but %SYSFUNC(LENGTH()) wouldn't work.
You could just add macro quoting.
%let null=; %put %sysfunc(countw(%bquote(&null))); %put %sysfunc(countw(%superq(null)));
Note that adding an extra comma will actually change the way it counts.
So instead of using the default delimiters it uses NO delimiters making any non empty string have just one word.
271 %let x=1 2/3+4; 272 %put %sysfunc(countw(&x)); 4 273 %put %sysfunc(countw(&x,)); 1
You could add back the default delimiters.
274 %put %sysfunc(countw(&x,( !$%&*+,-./;<^|))); 4
11-09-2017 10:41 PM
Let me try to explain my thinking a little bit differently.
%SYSFUNC is a macro function. When they developed it, they had to decide how arguments should be passed to the function that is called by %SYSFUNC. Consider:
%put %sysfunc(length("Hello")) ;
The developers could have decided that because %SYSFUNC is calling a DATA step function length, and DATA step functions require string literals to be passed in quotes, that the argument to length should be passed in quotes. So above would return 5. I think that would be the wrong decision, because the point of %SYSFUNC is to give you a function in the macro language, and the macro language does not use quotes to indicate strings. Therefore it returns 7 because the quote marks are part of the value. They designed the argument to the length function to be consistent with the macro language rather than the DATA step language.
The macro language allows you to define macro variables or parameters with a null value, and in the null value has a length of 0. Consider:
%let null=; %put %sysfunc(length(&null)); %put %sysfunc(length());
Here I have passed a null value to length, but it is length called by the macro function %sysfunc. The developers of %sysfunc could have decided that when an argument to the called function is null, it will send that null value to the function. Internally, it could send the length function "" if that is the easiest way to send a null value to the function. Just because length() returns an error in the DATA step language does not mean that %sysfunc(length()) should necessarily return an error in the macro language.
But instead they decided that "an empty argument position will not generate a NULL argument...." I can't think why that is better than "a empty [null] argument will generate a NULL argument."
11-09-2017 11:21 PM
I just noticed that %SYSFUNC(find(,)) will not throw an error, even though in a DATA step find(,) will throw an error.
I take that as evidence that it is "allowed" for a function called by %SYSFUNC to handle macro language arguments differently than a function called in a DATA step handles data step language arguments.
246 %put %sysfunc(find(,)); 0 247 248 data _null_; 249 x=find(,); - 159 ERROR 159-185: Null parameters for FIND are invalid. 250 run;
11-09-2017 11:54 PM - edited 11-09-2017 11:58 PM
@Quentin I reckon that's a lost battle.
data step functions are not 100% homogeneous in behaviour as you have shown with function compress.
%sysfunc() allows using [most] data step functions by wrapping a macro language layer around them. This layer, depending on how it's implemented adds extra data handling between the SAS code and the function call, and that layer may not always be exactly the same.
In any case, and regardless of what is acceptable or not, and what should exist or not (and opinions will vary), changing the behaviours now would break working code, and that will never happen.
As much as I despise settling for a status quo when improvements can be made, and as much as I believe one should always aim for "as good as possible" rather than "good enough", in this case there is too much to lose and too little to gain for anything to happen. You just have to use what's there, and be thankful that a SAS program written 30 years ago can still run untouched in version 9.4 (after 2 total rewrites of the SAS engine if I am not mistaken, in C for V6, and in C++ in V7 or V8?).
Other editors' cavalier attitude to backward compatibility (is MS the worst?) is too expensive and too frustrating.
11-10-2017 10:07 AM
Thanks @ChrisNZ, I'm willing to accept that it's a lost battle due to backwards compatibility concerns. There may be some people who *like* that it throws an error, and will let them know when they've accidentally forgotten to provide an argument to a function invoked via %sysfunc().
Had a good discussion on SAS-L with Joe Matise last night: https://listserv.uga.edu/cgi-bin/wa?A2=SAS-L;924508ab.1711b.
Based on some of his insights, here's my current thinking.
In the DATA step language, designers need to decide what to do if someone calls a function and does not provide an argument:
Note that there are functions, like COUNTW, where all of the arguments are optional. So when someone codes y=countw(); SAS needs to decide whether they have passed null values for every argument, or have forgotten to pass an argument. I think it was decided that an empty argument list means the user forgot to pass an argument, so should be an error. I don't know if this is coded as part of each function, or at a higher level. Are there any functions that you can call with an empty argument?
For an empty argument y=function(), it was a design decision that this should be interpreted as passing no arguments, rather than passing a null argument.
They made a different design decision for y=function(,). They decided that y=f(,) would be a legal way to pass two null arguments. There are situations where it is useful to be able to pass a null argument, and since SAS functions use positional parameters, it's reasonable that y=f(,) would pass two null arguments.
I do think it's inconsistent that in the DATA step language, there is a way to pass two null arguments y=f(,) but there is not a way to pass one null argument y=f(). But I can live with it, as having y=f() return an error just feels reasonable, as 99 times out of 100 this would be a programmer's mistake.
That said, I do wish that in the macro language setting, invoking a function via %sysfunc, they made a different decision. In the macro setting I think %sysfunc(function()) should be interpreted as passing a null argument, rather than passing no arguments. Because in the macro setting, it's common to have a macro variable which may be null, and other than %sysfunc, macro functions handle null arguments gracefully. But as @ChrisNZ said, since it's been many years of %sysfunc(function()) treating empty arguments as no argument (becoming an error), I don't think there's much hope of this being changed.
11-10-2017 09:15 AM
Yes @Ksharp, you could even but %str() before it. I think it just wants to see something between the parentheses of the function call.
%let list=; %put %sysfunc(countw(%str()&list));
11-10-2017 10:17 AM
In the SAS-L thread, Joe posted a nice function-style macro for a %countw() that returns 0 for null arguments:
%macro countw_safe/parmbuff; %if %sysevalf(%superq(syspbuff) eq ,boolean) %then 0; %else %sysfunc(countw(&syspbuff)); %mend countw_safe; %put %countw_safe(); %put %countw_safe(%str(a,b,c),%str(,));