Solved: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior n...

GGO · Posted 05-24-2020 11:00 AM

%SYSFUNC( DATA-STEP-FUNC(_null_string_) ) behavior seems inconsistent to me.

Any explanation?

Useful References:

%SYSFUNC() doc - Note the note that "an empty argument position will not generate a NULL argument, but a zero length argument."
This prior (but 2010!) detailed explanation of %SYSFUNC()
More below, after demo code

Demo code:

%macro test_null_string_handling(str);
  %if %length(%sysfunc(TRIMN(&str))) > 0 %then
      %put TRIMN() TEST - NON-ZERO STRING LENGTH. String is [&str];
  %else %put TRIMN() TEST - ZERO STRING LENGTH. String is [&str];

  %if %length(%sysfunc(STRIP(&str))) > 0 %then
      %put STRIP() TEST - NON-ZERO STRING LENGTH. String is [&str];
  %else %put STRIP() TEST - ZERO STRING LENGTH. String is [&str];

  %if %length(%sysfunc(COMPRESS(&str,,s))) > 0 %then
      %put COMPRESS() TEST - NON-ZERO STRING LENGTH. String is [&str];
  %else %put COMPRESS() TEST - ZERO STRING LENGTH. String is [&str];

  %if %sysfunc(PRXMATCH(m/\S/i,&str)) > 0 %then
      %put PRXMATCH() TEST - NON-ZERO STRING LENGTH. String is [&str];
  %else %put PRXMATCH() TEST - ZERO STRING LENGTH. String is [&str];
%mend test_null_string_handling;


%test_null_string_handling(%str())
%test_null_string_handling(%str( ))
%test_null_string_handling(one)
%test_null_string_handling( )

Log result:

TRIMN() TEST - ZERO STRING LENGTH. String is []
STRIP() TEST - ZERO STRING LENGTH. String is []
COMPRESS() TEST - ZERO STRING LENGTH. String is []
PRXMATCH() TEST - ZERO STRING LENGTH. String is []
319  %test_null_string_handling(%str( ))
TRIMN() TEST - ZERO STRING LENGTH. String is [ ]
STRIP() TEST - ZERO STRING LENGTH. String is [ ]
COMPRESS() TEST - ZERO STRING LENGTH. String is [ ]
PRXMATCH() TEST - ZERO STRING LENGTH. String is [ ]
320  %test_null_string_handling(one)
TRIMN() TEST - NON-ZERO STRING LENGTH. String is [one]
STRIP() TEST - NON-ZERO STRING LENGTH. String is [one]
COMPRESS() TEST - NON-ZERO STRING LENGTH. String is [one]
PRXMATCH() TEST - NON-ZERO STRING LENGTH. String is [one]
321  %test_null_string_handling( )
ERROR: The function TRIMN referenced by the %SYSFUNC or %QSYSFUNC macro function has too few arguments.
TRIMN() TEST - ZERO STRING LENGTH. String is []
ERROR: The function STRIP referenced by the %SYSFUNC or %QSYSFUNC macro function has too few arguments.
STRIP() TEST - ZERO STRING LENGTH. String is []
COMPRESS() TEST - ZERO STRING LENGTH. String is []
PRXMATCH() TEST - ZERO STRING LENGTH. String is []

QUESTION:

Why do TRIMN() and STRIP() choke on a null string and throw an ERROR, but COMPRESS() and PRXMATCH() do not?
How should we know whether a DATA STEP function in %SYSFUNC() handles null strings, and for which arguments?

I'm missing something fundamental. Thanks for any help figuring out what that is 🙂

A few further details from the doc pages of these 4 functions:

TRIMN: "TRIMN returns a string with a length of zero for a blank string."
STRIP: "For strings that are blank, the STRIP and TRIMN functions return a string with a length of zero" - Note that in the test, above, TRIM() also throws the ERROR.
COMPRESS: "COMPRESS function allows null arguments."
PRXMATCH: No mention of null or blank arguments. But it does not choke on them.

Thanks!

GGO · Posted 05-25-2020 12:24 PM

I have understood and do see your points, Quentin. And I very much appreciate your perspective and insights.

But in several details, we will simply disagree.

1 - DATA STEP and %SYSFUNC macro (text-generation) world are different, and have their own sets of rules.

In DATA STEP, a blank CHAR has length 1.
In macro world, a blank string has %length 0. This fundamental disconnect definitely contributes to SAS' inconsistencies.
%SYSFUNC() should bring DATA STEP functions into the macro (text-gen) world. To me, this means that %SYSFUNC() should translate the DATA STEP blank-char concept into a zero-length NULL STRING.
This has been and remains my main point: %SYSFUNC() fails to do this most fundamental translation into the macro (text-gen) world consistently - OK for multiple arguments; ERROR for single arguments. I see no explanation for this (details follow).

2 - %SYSFUNC() apparently demands a comma to delimit arguments. It does not recognize the open/close parens as delimiting a single argument. This to me is a design failure, and I struggle to alter that assessment.

See this documentation: "All arguments in SAS language functions within %SYSFUNC must be separated by commas."
That's the fundamental design flaw. How in the world are you supposed to separate a single argument with a comma?! Not possible - so design oversight.
The open/close parens of the function call should be sufficient to delimit a single argument, including the NULL argument, which otherwise works fine for 2+ NULL arguments.
%SYSFUNC()'s focus on the comma has blinded it to the open/close delimiters.
NULL arguments are fine in the macro (text-gen) world. %SYSFUNC() simply does not make the translation from DATA STEP to Macro world, and I think this stems from the open/close-parens-as-delimiters issue.
Yes, I expect that we will disagree on this point 🙂
( I have no issue with TRANWRD() and TRANSTRN() - function documentation clearly states when blank (null) arguments are valid - or not. )

3 - By comparison, I do consider STRIP() flawed and inconsistent with the macro world.

Taking your example, I see no difference between a NULL STRING (str=), and an EXPLICIT NULL STRING (str=%str()).
Both arguments are NULL STRINGS - strings with length zero.
The EXPLICIT NULL STRING simply helps %SYSFUNC() get over its fundamental failure to recognize a NULL STRING simply based on the open/close-paren delimiters:

%macro test_null_string_handling(str=);
  %put >>%sysfunc(STRIP(&str))<< ;
%mend test_null_string_handling;

%test_null_string_handling
%*;
%test_null_string_handling(str=%str())      /* <== NULL STRING ASSIST, to overcome %SYSFUNC() design flaw */
%*;

With the assist, despite any macro-quoting tokens/symbols, the DATA STEP function STRIP() nonetheless receives a NULL STRING.

The results - the assist of %str() eases %SYSFUNC() over its design flaw:

323  %test_null_string_handling
324  %*;
ERROR: The function STRIP referenced by the %SYSFUNC or %QSYSFUNC macro function has too few arguments.
>><<
325  %test_null_string_handling(str=%str())     /* <== NULL STRING ASSIST, to overcome %SYSFUNC() design flaw */
>><<
326  %*;

4 - %SYSFUNC() simply cannot recognize otherwise valid single-null-string-arguments.

That's a %SYSFUNC() design flaw.
Yes, I expect that we will disagree on this point, as well 🙂

This has been a very helpful, and at times very frustrating thread for me. I thank all who have contributed. Such frustration, for me, is a mark of insight and learning. This is what you have all given to me.

Thank you!

View solution in original post

Tom · Posted 05-24-2020 02:14 PM

I am not sure what you think you are testing.

Remember that once the macro variables are resolved it just like you typed the code without the macro variables. So that first test is nonsensical.

%sysfunc(trimn())

Why would call TRIMN() with NO input?

GGO · Posted 05-24-2020 02:51 PM

Macro code is dynamic

Tom · Posted 05-24-2020 02:27 PM

%SYSFUNC() has a lot of trouble because it is doing a lot more work than the data step compiler. In a data step the compiler knows what type of value it is dealing with. In macro world everything thing is a string. And many data step functions (like the CAT series) can use both numeric an character inputs for the same argument.

As to your test with empty strings you should be able to use macro quoting to prevent at some of those errors. At least then %SYSFUNC() is getting something instead of nothing at all.

%let x=;
%put |%sysfunc(trimn(&x))|;
%put |%sysfunc(trimn(%superq(x)))|;

PS For a review of methods to test of empty strings see this classic paper by Chung and our own @data_null__

https://support.sas.com/resources/papers/proceedings09/022-2009.pdf

GGO · Posted 05-24-2020 02:54 PM

Understood, Tom, and thanks for digging in.

Macro code is intended to handle dynamic inputs. If a particular usage reduces to a trivial input (a null string), why should some DATA STEP functions breeze right through these, while others throw errors?

I'm trying to understand this fundamental difference in DATA STEP functions, so that I can know when extra defensive programming is needed in macro - e.g., to avoid passing NULL STRINGS to DATA STEP functions.

Thanks!

s_lassen · Posted 05-24-2020 03:34 PM

I think the explanation is that you do not call COMPRESS and PRXMATCH with the wrong number of arguments. You call COMPRESS(&str,,s), which becomes COMPRESS(,,s). %SYSFUNC sees 3 arguments delimited by commas, two empty ones and one non-empty. Similar for PRXMATCH, the call becomes PRXMATCH(m/\S/i,), the comma makes %SYSFUNC see two arguments, one of which is an empty string.

Quentin · Posted 05-24-2020 05:38 PM

I agree with @s_lassen ; functions called by %SYSFUNC need at least one argument, even if the value of the argument is null. The argument cannot be empty.

In the DATA step language, all of the below error, because the function call has no arguments:

data _null_ ;
  x=trimn() ;
  x=upcase() ;
  x=strip() ;
  x=compress() ;
run ;

If you want to pass a null value to a function in the DATA step language, you can do it like:

data _null_ ;
  x=trimn("") ;
  x=upcase("") ;
  x=strip("") ;
  x=compress("") ;
run ;

In the macro language calling a function via %SYSFUNC, all of the below error, because the function call has no arguments:

%put %sysfunc(trimn()) ;
%put %sysfunc(upcase()) ;
%put %sysfunc(strip()) ;
%put %sysfunc(compress()) ;

If you want to pass a null value to a macro language function called via %SYSFUNC, you can do it like:

%put %sysfunc(trimn(%str())) ;
%put %sysfunc(upcase(%str())) ;
%put %sysfunc(strip(%str())) ;
%put %sysfunc(compress(%str())) ;

I see the above as consistent.

As for the %SYSFUNC docs, I'm not sure how to understand the bit you quote: "In particular, an empty argument position will not generate a NULL argument, but a zero length argument." I'm not sure what they are meaning for the difference between a null argument and a zero length argument.

Note that the error message caused by an empty function argument is coming from %SYSFUNC, not from the called function. And if you read the documentation for %SYSFUNC, it defines the arguments to the function as "one or more arguments used by function." So basically, %SYSFUNC must have something to pass to the function, it does not allow zero arguments.

For functions like COMPRESS, which accept an optional comma as a delimiter for multiple arguments, you can even pass just the comma:

47   %put >>%sysfunc(compress(,))<< ;
>><<

In that case, I think %SYSFUNC is happy because it sees the comma so it has something it can pass to compress. Then compress is happy, because it gets the comma, and says "oh, I got passed two null arguments, and I'm good with that."

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

GGO · Posted 05-24-2020 07:23 PM

corrected, based on final, comprehensive insights. -GGO

Thanks to all for these insights. I still do not yet see a suitable explanation. My original test calls all have the right number of arguments. Some arguments are simply NULL STRINGS.

This makes it more explicit, but the results are the same:

%macro test_null_string_handling(str=);
  %if %length(%sysfunc(TRIMN(&str))) > 0 %then
      %put TRIMN() TEST - NON-ZERO STRING LENGTH. String is [&str];
  %else %put TRIMN() TEST - ZERO STRING LENGTH. String is [&str];

  %if %length(%sysfunc(STRIP(&str))) > 0 %then
      %put STRIP() TEST - NON-ZERO STRING LENGTH. String is [&str];
  %else %put STRIP() TEST - ZERO STRING LENGTH. String is [&str];

  %if %length(%sysfunc(COMPRESS(&str,,s))) > 0 %then
      %put COMPRESS() TEST - NON-ZERO STRING LENGTH. String is [&str];
  %else %put COMPRESS() TEST - ZERO STRING LENGTH. String is [&str];

  %if %sysfunc(PRXMATCH(m/\S/i,&str)) > 0 %then
      %put PRXMATCH() TEST - NON-ZERO STRING LENGTH. String is [&str];
  %else %put PRXMATCH() TEST - ZERO STRING LENGTH. String is [&str];
%mend test_null_string_handling;

%test_null_string_handling(str=%str())
%test_null_string_handling(str=%str( ))
%test_null_string_handling(str=one)
%test_null_string_handling
%*;

Same results

73   %test_null_string_handling(str=%str())
TRIMN() TEST - ZERO STRING LENGTH. String is []
STRIP() TEST - ZERO STRING LENGTH. String is []
COMPRESS() TEST - ZERO STRING LENGTH. String is []
PRXMATCH() TEST - ZERO STRING LENGTH. String is []
74   %test_null_string_handling(str=%str( ))
TRIMN() TEST - ZERO STRING LENGTH. String is [ ]
STRIP() TEST - ZERO STRING LENGTH. String is [ ]
COMPRESS() TEST - ZERO STRING LENGTH. String is [ ]
PRXMATCH() TEST - ZERO STRING LENGTH. String is [ ]
75   %test_null_string_handling(str=one)
TRIMN() TEST - NON-ZERO STRING LENGTH. String is [one]
STRIP() TEST - NON-ZERO STRING LENGTH. String is [one]
COMPRESS() TEST - NON-ZERO STRING LENGTH. String is [one]
PRXMATCH() TEST - NON-ZERO STRING LENGTH. String is [one]
76   %test_null_string_handling
77   %*;
ERROR: The function TRIMN referenced by the %SYSFUNC or %QSYSFUNC macro function has too few arguments.
TRIMN() TEST - ZERO STRING LENGTH. String is []
ERROR: The function STRIP referenced by the %SYSFUNC or %QSYSFUNC macro function has too few arguments.
STRIP() TEST - ZERO STRING LENGTH. String is []
COMPRESS() TEST - ZERO STRING LENGTH. String is []
PRXMATCH() TEST - ZERO STRING LENGTH. String is []

These are DIFFERENT errors, not the same error:

%put %sysfunc(trimn()) ;     /* right number of arguments - NULL STRING ERROR */
%put %sysfunc(upcase()) ;    /* right number of arguments - NULL STRING ERROR */
%put %sysfunc(strip()) ;     /* right number of arguments - NULL STRING ERROR */
%put %sysfunc(compress()) ;  /* ONE argument is acceptable - NULL STRING ERROR */

By comparison to TRIMN() and STRIP() - and UPCASE() is apparently another that can't handle NULL STRINGS, COMPRESS() and some others handle the right number of NULL STRING arguments just fine:

%put %sysfunc(compress(,)) ; /* COMMA-DELIM ARGUMENTS - no null string error */

A fine result from COMPRESS() - a null string WITHOUT ANY ERROR.

92 %put [%sysfunc(compress(,))] ; /* COMMA-DELIM ARGUMENTS - no null string error */ []

My conclusion remains that DATA STEP functions are not consistent in how they respond to NULL STRINGS when used in %SYSFUNC().

Tom · Posted 05-24-2020 08:07 PM

Please re-read @Quentin ' s answer as I think you did not fully grasp what it was saying.

That being said I wouldn't bother too much about inconsistencies in SAS syntax. It has a lot of them. But the on one thing that it does do well is maintain backwards compatibility. That is why 30 year old program written for SAS version 6 or earlier can mainly run without errors or even modifications.

GGO · Posted 05-24-2020 09:04 PM

Favoring backwards compatibility over fixing bugs has turned into a huge liability for SAS. Some bold decisions could have assured its future strong position in analytics. Even Excel knows that 1/3 + 1/3 + 1/3 = 1. ~~SAS still does not. (I've read all of the detailed excuses about numeric precision and representation. None explains why all other tech can count to 1 by thirds.)~~ Fixed thanks to FreelanceReinhard, below.

145  data _null_;
146    sum = 0;
147    one = 1;
148    do i = 1 to 4;
149      sum = sum + 1/3;
150      if sum = one then
151         put 'got to ' one '(' one hex16. '), '
152              sum= '(hex16: ' sum hex16. ')';
153      else put 'still not ' one '(hex16:' one hex16. '), '
154               sum= '(hex16: ' sum hex16. ')'
155               ;
156    end;
157  run;

still not 1 (hex16:3FF0000000000000), sum=0.3333333333 (hex16: 3FD5555555555555)
still not 1 (hex16:3FF0000000000000), sum=0.6666666667 (hex16: 3FE5555555555555)
got to 1 (3FF0000000000000), sum=1 (hex16: 3FF0000000000000)      <== right !
still not 1 (hex16:3FF0000000000000), sum=1.3333333333 (hex16: 3FF5555555555555)

FIXED - thanks to FreelanceReinhard, below. Thanks!

FreelanceReinhard, below, is correct!

But my point still stands, below, regarding SAS' inability to count to 10 by tenths. Which Excel and the rest can accomplish.

If Apple had favored bwc over improvement and innovation, we'd still have floppy drives and wired headphone jacks everywhere.

If Google had favored bwc over improved concepts and innovation, we still just have Excel and Word (and Libre immitations).

Tech shouldn't be shackled by past mistakes and past conventions. Python/R make some bold moves between versions. How much has loss of bwc held them back? ( And they can both count to 1 by thirds 😉

FreelanceReinh · Posted 05-25-2020 06:51 AM

@GGO wrote:

(...) Even Excel knows that 1/3 + 1/3 + 1/3 = 1. SAS still does not. (...)

1    data _null_;
2      sum = 0;
3      do i = 1 to 3;
4        sum = sum + 1/3;
5        if sum = 3 then put 'got to 3 ' sum=;
6        else put 'not quite yet ' sum=;
7      end;
8    run;

not quite yet sum=0.3333333333
not quite yet sum=0.6666666667
not quite yet sum=1                /* eeeeerrp */

It does. You seem to assume that the sum should be 3?

Admittedly, the cumulative sum, if continued far enough, would miss 2 (and most, but not all integers >=5; assuming a Windows platform):

1    data _null_;
2    do i = 1 to 9;
3      sum + 1/3;
4      round_sum=round(sum,1e-9);
5      if mod(i,3)=0 then put sum= (sum round_sum)(=hex16./)/;
6    end;
7    run;

sum=1 sum=3FF0000000000000
round_sum=3FF0000000000000

sum=2 sum=3FFFFFFFFFFFFFFF
round_sum=4000000000000000

sum=3 sum=4008000000000000
round_sum=4008000000000000

@GGO wrote:

(...) Why in the world should SAS double-down for decades on different internal representation for real numbers (including integers) and separately for integers? (...)

This is not the case. SAS uses the same internal floating-point representation for integers as for any other numbers.

GGO · Posted 05-25-2020 11:04 AM

Thank you FreelanceReinhard for pointing out my error!

Nonetheless, my point still stands with counting to 1 by tenths:

99   data _null_;
100    sum = 0;
101    one = 1;
102    do i = 1 to 12;
103      sum = sum + 1/10;
104      if sum = one then
105         put 'got to ' one '(' one hex16. '), '
106              sum= '(hex16: ' sum hex16. ')';
107      else put 'still not ' one '(hex16:' one hex16. '), '
108               sum= '(hex16: ' sum hex16. ')'
109               ;
110    end;
111  run;

still not 1 (hex16:3FF0000000000000), sum=0.1 (hex16: 3FB999999999999A)
still not 1 (hex16:3FF0000000000000), sum=0.2 (hex16: 3FC999999999999A)
still not 1 (hex16:3FF0000000000000), sum=0.3 (hex16: 3FD3333333333334)
still not 1 (hex16:3FF0000000000000), sum=0.4 (hex16: 3FD999999999999A)
still not 1 (hex16:3FF0000000000000), sum=0.5 (hex16: 3FE0000000000000)
still not 1 (hex16:3FF0000000000000), sum=0.6 (hex16: 3FE3333333333333)
still not 1 (hex16:3FF0000000000000), sum=0.7 (hex16: 3FE6666666666666)
still not 1 (hex16:3FF0000000000000), sum=0.8 (hex16: 3FE9999999999999)
still not 1 (hex16:3FF0000000000000), sum=0.9 (hex16: 3FECCCCCCCCCCCCC)
still not 1 (hex16:3FF0000000000000), sum=1 (hex16: 3FEFFFFFFFFFFFFF)      <=== eeeerrp !!
still not 1 (hex16:3FF0000000000000), sum=1.1 (hex16: 3FF1999999999999)
still not 1 (hex16:3FF0000000000000), sum=1.2 (hex16: 3FF3333333333333)

(Hopefully I'm better at counting to 1 by tenths, compared with thirds 🙂

Quentin · Posted 05-25-2020 11:34 AM

You seem to think that Excel and Python are doing numeric precision "right" and SAS is doing it "wrong".

But they're all doing floating point arithmetic, with the associated limitations.

This Microsoft blog post explains why in Excel the equation =1*(.5-.4-.1) may not evaluate to 0:

https://www.microsoft.com/en-us/microsoft-365/blog/2008/04/10/understanding-floating-point-precision...

And the python docs explain why evaluating .1 + .1 + .1 == .3 returns false.

https://docs.python.org/3/tutorial/floatingpoint.html

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

GGO · Posted 05-25-2020 12:43 PM

Noted, Quentin. Thanks! I needed to have another look.

SAS and Python seem the same with addition (rounding error) and multiplication (none):

SAS

addition: posted below - rounding error
multiplication: floating-point arithmetic OK
%put %sysevalf(10*0.1 = 1) prints out 1

Python - same

GGO · Posted 05-25-2020 01:01 PM

Thanks, Reinhard! Fixed below.

NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

Re: NULL STRINGS and DATA STEP functions in %SYSFUNC() - Why is behavior not consistent?

SAS Innovate 2025: Register Today!

SAS Training: Just a Click Away