%SYSFUNC( DATA-STEP-FUNC(_null_string_) ) behavior seems inconsistent to me.
Any explanation?
Useful References:
Demo code:
%macro test_null_string_handling(str);
%if %length(%sysfunc(TRIMN(&str))) > 0 %then
%put TRIMN() TEST - NON-ZERO STRING LENGTH. String is [&str];
%else %put TRIMN() TEST - ZERO STRING LENGTH. String is [&str];
%if %length(%sysfunc(STRIP(&str))) > 0 %then
%put STRIP() TEST - NON-ZERO STRING LENGTH. String is [&str];
%else %put STRIP() TEST - ZERO STRING LENGTH. String is [&str];
%if %length(%sysfunc(COMPRESS(&str,,s))) > 0 %then
%put COMPRESS() TEST - NON-ZERO STRING LENGTH. String is [&str];
%else %put COMPRESS() TEST - ZERO STRING LENGTH. String is [&str];
%if %sysfunc(PRXMATCH(m/\S/i,&str)) > 0 %then
%put PRXMATCH() TEST - NON-ZERO STRING LENGTH. String is [&str];
%else %put PRXMATCH() TEST - ZERO STRING LENGTH. String is [&str];
%mend test_null_string_handling;
%test_null_string_handling(%str())
%test_null_string_handling(%str( ))
%test_null_string_handling(one)
%test_null_string_handling( )
Log result:
TRIMN() TEST - ZERO STRING LENGTH. String is [] STRIP() TEST - ZERO STRING LENGTH. String is [] COMPRESS() TEST - ZERO STRING LENGTH. String is [] PRXMATCH() TEST - ZERO STRING LENGTH. String is [] 319 %test_null_string_handling(%str( )) TRIMN() TEST - ZERO STRING LENGTH. String is [ ] STRIP() TEST - ZERO STRING LENGTH. String is [ ] COMPRESS() TEST - ZERO STRING LENGTH. String is [ ] PRXMATCH() TEST - ZERO STRING LENGTH. String is [ ] 320 %test_null_string_handling(one) TRIMN() TEST - NON-ZERO STRING LENGTH. String is [one] STRIP() TEST - NON-ZERO STRING LENGTH. String is [one] COMPRESS() TEST - NON-ZERO STRING LENGTH. String is [one] PRXMATCH() TEST - NON-ZERO STRING LENGTH. String is [one] 321 %test_null_string_handling( ) ERROR: The function TRIMN referenced by the %SYSFUNC or %QSYSFUNC macro function has too few arguments. TRIMN() TEST - ZERO STRING LENGTH. String is [] ERROR: The function STRIP referenced by the %SYSFUNC or %QSYSFUNC macro function has too few arguments. STRIP() TEST - ZERO STRING LENGTH. String is [] COMPRESS() TEST - ZERO STRING LENGTH. String is [] PRXMATCH() TEST - ZERO STRING LENGTH. String is []
QUESTION:
I'm missing something fundamental. Thanks for any help figuring out what that is 🙂
A few further details from the doc pages of these 4 functions:
Thanks!
I have understood and do see your points, Quentin. And I very much appreciate your perspective and insights.
But in several details, we will simply disagree.
1 - DATA STEP and %SYSFUNC macro (text-generation) world are different, and have their own sets of rules.
2 - %SYSFUNC() apparently demands a comma to delimit arguments. It does not recognize the open/close parens as delimiting a single argument. This to me is a design failure, and I struggle to alter that assessment.
3 - By comparison, I do consider STRIP() flawed and inconsistent with the macro world.
%macro test_null_string_handling(str=);
%put >>%sysfunc(STRIP(&str))<< ;
%mend test_null_string_handling;
%test_null_string_handling
%*;
%test_null_string_handling(str=%str()) /* <== NULL STRING ASSIST, to overcome %SYSFUNC() design flaw */
%*;
With the assist, despite any macro-quoting tokens/symbols, the DATA STEP function STRIP() nonetheless receives a NULL STRING.
The results - the assist of %str() eases %SYSFUNC() over its design flaw:
323 %test_null_string_handling 324 %*; ERROR: The function STRIP referenced by the %SYSFUNC or %QSYSFUNC macro function has too few arguments. >><< 325 %test_null_string_handling(str=%str()) /* <== NULL STRING ASSIST, to overcome %SYSFUNC() design flaw */ >><< 326 %*;
4 - %SYSFUNC() simply cannot recognize otherwise valid single-null-string-arguments.
This has been a very helpful, and at times very frustrating thread for me. I thank all who have contributed. Such frustration, for me, is a mark of insight and learning. This is what you have all given to me.
Thank you!
I am not sure what you think you are testing.
Remember that once the macro variables are resolved it just like you typed the code without the macro variables. So that first test is nonsensical.
%sysfunc(trimn())
Why would call TRIMN() with NO input?
Macro code is dynamic
%SYSFUNC() has a lot of trouble because it is doing a lot more work than the data step compiler. In a data step the compiler knows what type of value it is dealing with. In macro world everything thing is a string. And many data step functions (like the CAT series) can use both numeric an character inputs for the same argument.
As to your test with empty strings you should be able to use macro quoting to prevent at some of those errors. At least then %SYSFUNC() is getting something instead of nothing at all.
%let x=;
%put |%sysfunc(trimn(&x))|;
%put |%sysfunc(trimn(%superq(x)))|;
PS For a review of methods to test of empty strings see this classic paper by Chung and our own @data_null__
https://support.sas.com/resources/papers/proceedings09/022-2009.pdf
I think the explanation is that you do not call COMPRESS and PRXMATCH with the wrong number of arguments. You call COMPRESS(&str,,s), which becomes COMPRESS(,,s). %SYSFUNC sees 3 arguments delimited by commas, two empty ones and one non-empty. Similar for PRXMATCH, the call becomes PRXMATCH(m/\S/i,), the comma makes %SYSFUNC see two arguments, one of which is an empty string.
I agree with @s_lassen ; functions called by %SYSFUNC need at least one argument, even if the value of the argument is null. The argument cannot be empty.
In the DATA step language, all of the below error, because the function call has no arguments:
data _null_ ;
x=trimn() ;
x=upcase() ;
x=strip() ;
x=compress() ;
run ;
If you want to pass a null value to a function in the DATA step language, you can do it like:
data _null_ ;
x=trimn("") ;
x=upcase("") ;
x=strip("") ;
x=compress("") ;
run ;
In the macro language calling a function via %SYSFUNC, all of the below error, because the function call has no arguments:
%put %sysfunc(trimn()) ;
%put %sysfunc(upcase()) ;
%put %sysfunc(strip()) ;
%put %sysfunc(compress()) ;
If you want to pass a null value to a macro language function called via %SYSFUNC, you can do it like:
%put %sysfunc(trimn(%str())) ;
%put %sysfunc(upcase(%str())) ;
%put %sysfunc(strip(%str())) ;
%put %sysfunc(compress(%str())) ;
I see the above as consistent.
As for the %SYSFUNC docs, I'm not sure how to understand the bit you quote: "In particular, an empty argument position will not generate a NULL argument, but a zero length argument." I'm not sure what they are meaning for the difference between a null argument and a zero length argument.
Note that the error message caused by an empty function argument is coming from %SYSFUNC, not from the called function. And if you read the documentation for %SYSFUNC, it defines the arguments to the function as "one or more arguments used by function." So basically, %SYSFUNC must have something to pass to the function, it does not allow zero arguments.
For functions like COMPRESS, which accept an optional comma as a delimiter for multiple arguments, you can even pass just the comma:
47 %put >>%sysfunc(compress(,))<< ; >><<
In that case, I think %SYSFUNC is happy because it sees the comma so it has something it can pass to compress. Then compress is happy, because it gets the comma, and says "oh, I got passed two null arguments, and I'm good with that."
corrected, based on final, comprehensive insights. -GGO
Thanks to all for these insights. I still do not yet see a suitable explanation. My original test calls all have the right number of arguments. Some arguments are simply NULL STRINGS.
This makes it more explicit, but the results are the same:
%macro test_null_string_handling(str=);
%if %length(%sysfunc(TRIMN(&str))) > 0 %then
%put TRIMN() TEST - NON-ZERO STRING LENGTH. String is [&str];
%else %put TRIMN() TEST - ZERO STRING LENGTH. String is [&str];
%if %length(%sysfunc(STRIP(&str))) > 0 %then
%put STRIP() TEST - NON-ZERO STRING LENGTH. String is [&str];
%else %put STRIP() TEST - ZERO STRING LENGTH. String is [&str];
%if %length(%sysfunc(COMPRESS(&str,,s))) > 0 %then
%put COMPRESS() TEST - NON-ZERO STRING LENGTH. String is [&str];
%else %put COMPRESS() TEST - ZERO STRING LENGTH. String is [&str];
%if %sysfunc(PRXMATCH(m/\S/i,&str)) > 0 %then
%put PRXMATCH() TEST - NON-ZERO STRING LENGTH. String is [&str];
%else %put PRXMATCH() TEST - ZERO STRING LENGTH. String is [&str];
%mend test_null_string_handling;
%test_null_string_handling(str=%str())
%test_null_string_handling(str=%str( ))
%test_null_string_handling(str=one)
%test_null_string_handling
%*;
Same results
73 %test_null_string_handling(str=%str()) TRIMN() TEST - ZERO STRING LENGTH. String is [] STRIP() TEST - ZERO STRING LENGTH. String is [] COMPRESS() TEST - ZERO STRING LENGTH. String is [] PRXMATCH() TEST - ZERO STRING LENGTH. String is [] 74 %test_null_string_handling(str=%str( )) TRIMN() TEST - ZERO STRING LENGTH. String is [ ] STRIP() TEST - ZERO STRING LENGTH. String is [ ] COMPRESS() TEST - ZERO STRING LENGTH. String is [ ] PRXMATCH() TEST - ZERO STRING LENGTH. String is [ ] 75 %test_null_string_handling(str=one) TRIMN() TEST - NON-ZERO STRING LENGTH. String is [one] STRIP() TEST - NON-ZERO STRING LENGTH. String is [one] COMPRESS() TEST - NON-ZERO STRING LENGTH. String is [one] PRXMATCH() TEST - NON-ZERO STRING LENGTH. String is [one] 76 %test_null_string_handling 77 %*; ERROR: The function TRIMN referenced by the %SYSFUNC or %QSYSFUNC macro function has too few arguments. TRIMN() TEST - ZERO STRING LENGTH. String is [] ERROR: The function STRIP referenced by the %SYSFUNC or %QSYSFUNC macro function has too few arguments. STRIP() TEST - ZERO STRING LENGTH. String is [] COMPRESS() TEST - ZERO STRING LENGTH. String is [] PRXMATCH() TEST - ZERO STRING LENGTH. String is []
These are DIFFERENT errors, not the same error:
%put %sysfunc(trimn()) ; /* right number of arguments - NULL STRING ERROR */ %put %sysfunc(upcase()) ; /* right number of arguments - NULL STRING ERROR */ %put %sysfunc(strip()) ; /* right number of arguments - NULL STRING ERROR */ %put %sysfunc(compress()) ; /* ONE argument is acceptable - NULL STRING ERROR */
By comparison to TRIMN() and STRIP() - and UPCASE() is apparently another that can't handle NULL STRINGS, COMPRESS() and some others handle the right number of NULL STRING arguments just fine:
%put %sysfunc(compress(,)) ; /* COMMA-DELIM ARGUMENTS - no null string error */
A fine result from COMPRESS() - a null string WITHOUT ANY ERROR.
92 %put [%sysfunc(compress(,))] ; /* COMMA-DELIM ARGUMENTS - no null string error */ []
My conclusion remains that DATA STEP functions are not consistent in how they respond to NULL STRINGS when used in %SYSFUNC().
Please re-read @Quentin ' s answer as I think you did not fully grasp what it was saying.
That being said I wouldn't bother too much about inconsistencies in SAS syntax. It has a lot of them. But the on one thing that it does do well is maintain backwards compatibility. That is why 30 year old program written for SAS version 6 or earlier can mainly run without errors or even modifications.
Favoring backwards compatibility over fixing bugs has turned into a huge liability for SAS. Some bold decisions could have assured its future strong position in analytics. Even Excel knows that 1/3 + 1/3 + 1/3 = 1. SAS still does not. (I've read all of the detailed excuses about numeric precision and representation. None explains why all other tech can count to 1 by thirds.) Fixed thanks to FreelanceReinhard, below.
145 data _null_; 146 sum = 0; 147 one = 1; 148 do i = 1 to 4; 149 sum = sum + 1/3; 150 if sum = one then 151 put 'got to ' one '(' one hex16. '), ' 152 sum= '(hex16: ' sum hex16. ')'; 153 else put 'still not ' one '(hex16:' one hex16. '), ' 154 sum= '(hex16: ' sum hex16. ')' 155 ; 156 end; 157 run; still not 1 (hex16:3FF0000000000000), sum=0.3333333333 (hex16: 3FD5555555555555) still not 1 (hex16:3FF0000000000000), sum=0.6666666667 (hex16: 3FE5555555555555) got to 1 (3FF0000000000000), sum=1 (hex16: 3FF0000000000000) <== right ! still not 1 (hex16:3FF0000000000000), sum=1.3333333333 (hex16: 3FF5555555555555) FIXED - thanks to FreelanceReinhard, below. Thanks!
FreelanceReinhard, below, is correct!
But my point still stands, below, regarding SAS' inability to count to 10 by tenths. Which Excel and the rest can accomplish.
If Apple had favored bwc over improvement and innovation, we'd still have floppy drives and wired headphone jacks everywhere.
If Google had favored bwc over improved concepts and innovation, we still just have Excel and Word (and Libre immitations).
Tech shouldn't be shackled by past mistakes and past conventions. Python/R make some bold moves between versions. How much has loss of bwc held them back? ( And they can both count to 1 by thirds 😉
@GGO wrote:
(...) Even Excel knows that 1/3 + 1/3 + 1/3 = 1. SAS still does not. (...)
1 data _null_; 2 sum = 0; 3 do i = 1 to 3; 4 sum = sum + 1/3; 5 if sum = 3 then put 'got to 3 ' sum=; 6 else put 'not quite yet ' sum=; 7 end; 8 run; not quite yet sum=0.3333333333 not quite yet sum=0.6666666667 not quite yet sum=1 /* eeeeerrp */
It does. You seem to assume that the sum should be 3?
Admittedly, the cumulative sum, if continued far enough, would miss 2 (and most, but not all integers >=5; assuming a Windows platform):
1 data _null_; 2 do i = 1 to 9; 3 sum + 1/3; 4 round_sum=round(sum,1e-9); 5 if mod(i,3)=0 then put sum= (sum round_sum)(=hex16./)/; 6 end; 7 run; sum=1 sum=3FF0000000000000 round_sum=3FF0000000000000 sum=2 sum=3FFFFFFFFFFFFFFF round_sum=4000000000000000 sum=3 sum=4008000000000000 round_sum=4008000000000000
@GGO wrote:
(...) Why in the world should SAS double-down for decades on different internal representation for real numbers (including integers) and separately for integers? (...)
This is not the case. SAS uses the same internal floating-point representation for integers as for any other numbers.
Thank you FreelanceReinhard for pointing out my error!
Nonetheless, my point still stands with counting to 1 by tenths:
99 data _null_; 100 sum = 0; 101 one = 1; 102 do i = 1 to 12; 103 sum = sum + 1/10; 104 if sum = one then 105 put 'got to ' one '(' one hex16. '), ' 106 sum= '(hex16: ' sum hex16. ')'; 107 else put 'still not ' one '(hex16:' one hex16. '), ' 108 sum= '(hex16: ' sum hex16. ')' 109 ; 110 end; 111 run; still not 1 (hex16:3FF0000000000000), sum=0.1 (hex16: 3FB999999999999A) still not 1 (hex16:3FF0000000000000), sum=0.2 (hex16: 3FC999999999999A) still not 1 (hex16:3FF0000000000000), sum=0.3 (hex16: 3FD3333333333334) still not 1 (hex16:3FF0000000000000), sum=0.4 (hex16: 3FD999999999999A) still not 1 (hex16:3FF0000000000000), sum=0.5 (hex16: 3FE0000000000000) still not 1 (hex16:3FF0000000000000), sum=0.6 (hex16: 3FE3333333333333) still not 1 (hex16:3FF0000000000000), sum=0.7 (hex16: 3FE6666666666666) still not 1 (hex16:3FF0000000000000), sum=0.8 (hex16: 3FE9999999999999) still not 1 (hex16:3FF0000000000000), sum=0.9 (hex16: 3FECCCCCCCCCCCCC) still not 1 (hex16:3FF0000000000000), sum=1 (hex16: 3FEFFFFFFFFFFFFF) <=== eeeerrp !! still not 1 (hex16:3FF0000000000000), sum=1.1 (hex16: 3FF1999999999999) still not 1 (hex16:3FF0000000000000), sum=1.2 (hex16: 3FF3333333333333)
(Hopefully I'm better at counting to 1 by tenths, compared with thirds 🙂
You seem to think that Excel and Python are doing numeric precision "right" and SAS is doing it "wrong".
But they're all doing floating point arithmetic, with the associated limitations.
This Microsoft blog post explains why in Excel the equation =1*(.5-.4-.1) may not evaluate to 0:
And the python docs explain why evaluating .1 + .1 + .1 == .3 returns false.
https://docs.python.org/3/tutorial/floatingpoint.html
Noted, Quentin. Thanks! I needed to have another look.
SAS and Python seem the same with addition (rounding error) and multiplication (none):
SAS
Python - same
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.