BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DanieleTiles
Calcite | Level 5
Hi to all,
in T-SQL there is a simple function (called isnumeric()) which checks if the value inside a column is numeric or not.
I couldn't find anything similar in SAS functions (to be used in datasteps/proc sql)...the first (and at the moment, working) solution I've found is to check with the value returned from an input:

proc sql;
select *
from work.test a
where input(a.charValue,best.) = .;
quit;

I wanted to know...is there a better way? Is the method I've adopted correct?
Thanks

Daniele Message was edited by: Daniele Tiles
1 ACCEPTED SOLUTION

Accepted Solutions
ScottBass
Rhodochrosite | Level 12

/** Moderator's Edit **/

This is a robust solution and includes links to macro code.  Other posters mention a variety of useful SAS functions and are worth a look.

 

OPs solution is also good -- the input function will try to convert the string to numeric and will return missing if it cannot.

 

Another simple solution is to add 0 to the character string -- when using the '+' operator, SAS will automatically attempt to convert character
strings to numeric where possible. If SAS cannot do this, then the result returns a missing number because '+' cannot operate on character strings. This will not, however, recognize numbers such as '1,000'


DanielSantos gives several useful functions:
anyalpha - returns index of first alphabetic character
anydigit - returns index of first numeric character
notalpha - returns index of first NON alphabetic character (includes punctuation, etc)
notdigit - returns index of first NON numeric character (includes punctuation, etc)

 

These functions don't give desired results when the string is something like '1.1.1', '1,000', '-500', etc.


PRXMATCH() (used as part of ScottBass's solution) is another way to accomplish this. Use the regular expression:
^ *[+-]?\d{1,3}(?:,\d{3})*\.?\d*$|^ *[+-]?\d*\.?\d*$|^ *[+-]?(?:\d+\.?\d*|\d*\.?\d+)[eE]-?\d+$

This should capture positive/negative numbers, numbers with commas, and numbers in scientific notation using e or E.

This site is a useful place to test regular expressions if you're curious: https://regex101.com/r/xV4eL5/3

 

/** end edit **/

 

I needed to do this same thing today, and found this post.

 

After mulling over this a bit, I've approached this problem a bit differently than the other replies. I'm replying to this old post in case someone searches for this in the future and prefers my approach.

 

My approach is to use the INPUT function, best32. format, and ?? INPUT function modifiers, then check if the result is missing.

 

We already know the input data is character (otherwise we wouldn't need this check!), so the input data is already in the data type (character) expected by the input function.

 

With this approach, there are some additional checks I can do.

 

Here is the code:

 

data test;
   length char $32;
   input char $char32.;
   num      = input(char,?? best32.);
   IsNum    = (not missing(num));
   IsNum2   = (not missing(num) or strip(char) in ('','.') or (.A le num le .Z) or (num eq ._));
   IsInt    = (not missing(num) and int(num) = num);
   IsFloat  = (abs(num - int(num)) gt 0);
   IsNotNeg = (num ge 0);
   IsPos    = (num gt 0);
   test=num-int(num);
   datalines;

.
._
.A
.Z
-1
0
1
1.1
 123456789012.123456789012
-123456789012.123456789012
 1234567890123.1234567890123
-1234567890123.1234567890123
 12345678901234.12345678901234
-12345678901234.12345678901234
 123456789012345.123456789012345
-123456789012345.123456789012345
 1234567890123456.1234567890123456
-1234567890123456.1234567890123456
 1.1
 -123
 -123.45
 --123.45
A
 B
123 456
~!@#$%^&*()_+=
;
run;

However, this will generate "Missing values were generated ..." messages in the SAS log. I like as clean a SAS log as possible. It's unfortunate that SAS doesn't support shortcircuit logic checking as done in many modern programming languages.

 

So, here is Version 2; less terse code but with the same results:

 

data test;
   length char $32;
   input char $char32.;
   num = input(char,?? best32.);
   if (not missing(num)) then do;
      IsNum    = 1;
      IsNum2   = 1;
      IsInt    = (int(num) = num);
      IsFloat  = (abs(num - int(num)) gt 0);
      IsNotNeg = (num ge 0);
      IsPos    = (num gt 0);
      t1=num-int(num);
   end;
   else do;
      IsNum    = 0;
      IsNum2   = (strip(char) in ('','.') or (.A le num le .Z) or (num = ._));
      IsInt    = 0;
      IsFloat  = 0;
      IsNotNeg = 0;
      IsPos    = 0;
   end;
   datalines;

.
._
.A
.Z
-1
0
1
1.1
 123456789012.123456789012
-123456789012.123456789012
 1234567890123.1234567890123
-1234567890123.1234567890123
 12345678901234.12345678901234
-12345678901234.12345678901234
 123456789012345.123456789012345
-123456789012345.123456789012345
 1234567890123456.1234567890123456
-1234567890123456.1234567890123456
 1.1
 -123
 -123.45
 --123.45
A
 B
123 456
~!@#$%^&*()_+=
;
run;

Comments:

 

IsNum gives expected results, except for missing character input. Technically, "valid" character input which generates "expected" missing numeric data is in fact numeric input - a missing value is valid data in a SAS numeric column.

 

IsNum2 compensates for this by checking the value of the character input for "valid" missing or special missing input, as opposed to invalid numeric input which converts to a missing value.

 

The check for IsInt and IsFloat collapses for large numeric data, as the precision is lost and the difference between int(num) - num becomes 0. If there is a better way to approach this precision issue please let me know (I investigated INTZ, CEIL/Z, and FLOOR/Z functions).

 

I also had to do this same thing with macro data, and wanted a "pure macro" solution, or function style macro (instead of running a data step). This is complicated by the fact that %sysfunc does not support the INPUT function, but only the INPUTN (and INPUTC) function instead, and the INPUTN function does not support the ?? modifier.

 

(If a little birdie at SAS can explain the technical reasons why %sysfunc doesn't support the INPUT function, or why INPUTN doesn't support the ? and ?? modifiers, please email me privately).

 

Here is my macro approach:

 

%macro IsNumM(char,type=NUM,missing=N);
   %let type=%upcase(&type);
   %let missing=%upcase(&missing);

   %* remove leading spaces ;
   %let char=&char;
   %* put CHAR=*&char*;  %* for debugging ;

   %* check if empty string or missing data ;
   %* if YES: ;
   %* if MISSING=N treat as not valid numeric data ;
   %* if MISSING=Y treat as valid numeric data ;
   %let rx = %sysfunc(prxparse(/^( *|\.[A-Z_]*)$/o));
   %let match = %sysfunc(prxmatch(&rx,%superq(char)));
   %syscall prxfree(rx);
   %if (&match eq 1) %then %do;
      %if (&missing) %then %let rtn=1; %else %let rtn=0;
&rtn
      %return;
   %end;

   %* use a Perl regex to test for numeric input ;
   %* the regex is "/^(-{0,1})(\d*)(\.{0,1})(\d*)$/o", which means: ;
   %* beginning of string (^): ;
   %* 0 or 1 minus signs: ;
   %* 0 or more digits: ;
   %* 0 or 1 periods: ;
   %* 0 or more digits: ;
   %* end of string ($) ;
   %* compile the regex once (o) ;
   %let rx = %sysfunc(prxparse(/^(-{0,1})(\d*)(\.{0,1})(\d*)$/o));
   %let match = %sysfunc(prxmatch(&rx,%superq(char)));
   %syscall prxfree(rx);

   %* if no match then not num ;
   %if (&match eq 0) %then %do;
0
      %return;
   %end;

   %* the Perl regex should be sufficient to cleanse the input to the inputn function ;   
   %* convert the value to num. if it is missing then not num ;
   %* note: this only supports options missing='.' and options missing=' ' ;
   %let num=%sysfunc(inputn(%superq(char),best32.),best32.);
   %let num=#
   %* put NUM =#&num#; %* for debugging ;
   %if (%superq(num) eq .) or (%superq(num) eq ) %then %do;
0
      %return;
   %end;

   %* it is probably a num :-) ;
   %if (&type eq NUM) %then %do;
1 
   %end;
   %else
   %if (&type eq INT) %then %do;
      %let int=%sysfunc(int(&num),32.);
      %* put INT =#&int#; %* for debugging ;
      %let rtn=%eval(&int eq &num);
&rtn
   %end;
   %else
   %if (&type eq NONNEG) %then %do;
      %let rtn=%sysevalf(&num ge 0);
&rtn
   %end;
   %else
   %if (&type eq POS) %then %do;
      %let rtn=%sysevalf(&num gt 0);
&rtn
   %end;
%mend;

Finally, I've macro-ized both approaches and have uploaded them to my GitHub repository:

 

https://github.com/scottbass/SAS/blob/master/Macro/IsNum.sas
https://github.com/scottbass/SAS/blob/master/Macro/IsNumM.sas

 

Hope this helps...


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

View solution in original post

6 REPLIES 6
DanielSantos
Barite | Level 11
Hello Daniele

There is a "neater" way of doing that.
There is actually a couple of functions for checking the contents of a char variable for alpha or num chars.

Check:
ANYALPHA: http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a002194060.htm
ANYDIGIT: http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a002194193.htm
NOTALPHA: http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a002194058.htm
NOTDIGIT: http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a002194195.htm

Greetings from Portugal.

Daniel Santos at www.cgd.pt
DanieleTiles
Calcite | Level 5
Hi Daniel,
thanks for the suggestion! However, I've tested the method that seemed to fit best my issues (NOTDIGIT)...but I've found two remarks (one solved, the other one...):
-- NOTDIGIT has to consider also blanks, so it has to be called always as NOTDIGIT(TRIM(LEFT(<<STRING>>)))...but that's ok
-- The NOTDIGIT might be useful for recognizing integers, but not float/double (for example, if I have to check that a SALES field has got not numeric values....the dot isn't (rightly) recognized as digit).
DanielSantos
Barite | Level 11
Hello Danielle.

OK, if your dealing with blanks and floats try the ANYALPHA function instead the NOTDIGIT.

data _null_;
if anyalpha(' 2 ') then put 'ALPHA'; else put 'DIGIT';
if anyalpha('2.2') then put 'ALPHA'; else put 'DIGIT';
if anyalpha(' 2.2 ') then put 'ALPHA'; else put 'DIGIT';

/* caution, this will fail for digits with more than one . */
if anyalpha('2.2.2') then put 'ALPHA'; else put 'DIGIT';
run;

Greetings from Portugal.

Daniel Santos at www.cgd.pt
MC1985
Obsidian | Level 7

if you want to check that all char of your string are numeric in a simple and fast way: you can use Verify Function(variable, <list of character to find>), that returns the position of the first character in a string that is not in any of several other strings. If you list all number(see example) function give to you 0 that means the values in string are all numeric..

 

data _null_;
length string_to_verify $12.;

    string_to_verify='1234567';
    check               = notdigit(string_to_verify);
    check2             = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='12345678';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='1234567 9';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='123';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='A123';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='A1B23';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='1B23';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='AAA3';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='123C';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;
run;

* LOG: ------------------------------------------------------------------;

string_to_verify=1234567 check=8 check2=0
string_to_verify=12345678 check=9 check2=0
string_to_verify=1234567 9 check=8 check2=8 /*blank char is not numeric!*/
string_to_verify=123 check=4 check2=0
string_to_verify=A123 check=1 check2=1
string_to_verify=A1B23 check=1 check2=1
string_to_verify=1B23 check=2 check2=2
string_to_verify=AAA3 check=1 check2=1
string_to_verify=123C check=4 check2=4

here there's a beautyfull pdf and there's another on 'Charachter function' with a lot of example (ex. anydigit, etc..).

 

bye

Martino Crippa
ScottBass
Rhodochrosite | Level 12

Note this approved solution does not cater for valid negative numeric data.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
ScottBass
Rhodochrosite | Level 12

/** Moderator's Edit **/

This is a robust solution and includes links to macro code.  Other posters mention a variety of useful SAS functions and are worth a look.

 

OPs solution is also good -- the input function will try to convert the string to numeric and will return missing if it cannot.

 

Another simple solution is to add 0 to the character string -- when using the '+' operator, SAS will automatically attempt to convert character
strings to numeric where possible. If SAS cannot do this, then the result returns a missing number because '+' cannot operate on character strings. This will not, however, recognize numbers such as '1,000'


DanielSantos gives several useful functions:
anyalpha - returns index of first alphabetic character
anydigit - returns index of first numeric character
notalpha - returns index of first NON alphabetic character (includes punctuation, etc)
notdigit - returns index of first NON numeric character (includes punctuation, etc)

 

These functions don't give desired results when the string is something like '1.1.1', '1,000', '-500', etc.


PRXMATCH() (used as part of ScottBass's solution) is another way to accomplish this. Use the regular expression:
^ *[+-]?\d{1,3}(?:,\d{3})*\.?\d*$|^ *[+-]?\d*\.?\d*$|^ *[+-]?(?:\d+\.?\d*|\d*\.?\d+)[eE]-?\d+$

This should capture positive/negative numbers, numbers with commas, and numbers in scientific notation using e or E.

This site is a useful place to test regular expressions if you're curious: https://regex101.com/r/xV4eL5/3

 

/** end edit **/

 

I needed to do this same thing today, and found this post.

 

After mulling over this a bit, I've approached this problem a bit differently than the other replies. I'm replying to this old post in case someone searches for this in the future and prefers my approach.

 

My approach is to use the INPUT function, best32. format, and ?? INPUT function modifiers, then check if the result is missing.

 

We already know the input data is character (otherwise we wouldn't need this check!), so the input data is already in the data type (character) expected by the input function.

 

With this approach, there are some additional checks I can do.

 

Here is the code:

 

data test;
   length char $32;
   input char $char32.;
   num      = input(char,?? best32.);
   IsNum    = (not missing(num));
   IsNum2   = (not missing(num) or strip(char) in ('','.') or (.A le num le .Z) or (num eq ._));
   IsInt    = (not missing(num) and int(num) = num);
   IsFloat  = (abs(num - int(num)) gt 0);
   IsNotNeg = (num ge 0);
   IsPos    = (num gt 0);
   test=num-int(num);
   datalines;

.
._
.A
.Z
-1
0
1
1.1
 123456789012.123456789012
-123456789012.123456789012
 1234567890123.1234567890123
-1234567890123.1234567890123
 12345678901234.12345678901234
-12345678901234.12345678901234
 123456789012345.123456789012345
-123456789012345.123456789012345
 1234567890123456.1234567890123456
-1234567890123456.1234567890123456
 1.1
 -123
 -123.45
 --123.45
A
 B
123 456
~!@#$%^&*()_+=
;
run;

However, this will generate "Missing values were generated ..." messages in the SAS log. I like as clean a SAS log as possible. It's unfortunate that SAS doesn't support shortcircuit logic checking as done in many modern programming languages.

 

So, here is Version 2; less terse code but with the same results:

 

data test;
   length char $32;
   input char $char32.;
   num = input(char,?? best32.);
   if (not missing(num)) then do;
      IsNum    = 1;
      IsNum2   = 1;
      IsInt    = (int(num) = num);
      IsFloat  = (abs(num - int(num)) gt 0);
      IsNotNeg = (num ge 0);
      IsPos    = (num gt 0);
      t1=num-int(num);
   end;
   else do;
      IsNum    = 0;
      IsNum2   = (strip(char) in ('','.') or (.A le num le .Z) or (num = ._));
      IsInt    = 0;
      IsFloat  = 0;
      IsNotNeg = 0;
      IsPos    = 0;
   end;
   datalines;

.
._
.A
.Z
-1
0
1
1.1
 123456789012.123456789012
-123456789012.123456789012
 1234567890123.1234567890123
-1234567890123.1234567890123
 12345678901234.12345678901234
-12345678901234.12345678901234
 123456789012345.123456789012345
-123456789012345.123456789012345
 1234567890123456.1234567890123456
-1234567890123456.1234567890123456
 1.1
 -123
 -123.45
 --123.45
A
 B
123 456
~!@#$%^&*()_+=
;
run;

Comments:

 

IsNum gives expected results, except for missing character input. Technically, "valid" character input which generates "expected" missing numeric data is in fact numeric input - a missing value is valid data in a SAS numeric column.

 

IsNum2 compensates for this by checking the value of the character input for "valid" missing or special missing input, as opposed to invalid numeric input which converts to a missing value.

 

The check for IsInt and IsFloat collapses for large numeric data, as the precision is lost and the difference between int(num) - num becomes 0. If there is a better way to approach this precision issue please let me know (I investigated INTZ, CEIL/Z, and FLOOR/Z functions).

 

I also had to do this same thing with macro data, and wanted a "pure macro" solution, or function style macro (instead of running a data step). This is complicated by the fact that %sysfunc does not support the INPUT function, but only the INPUTN (and INPUTC) function instead, and the INPUTN function does not support the ?? modifier.

 

(If a little birdie at SAS can explain the technical reasons why %sysfunc doesn't support the INPUT function, or why INPUTN doesn't support the ? and ?? modifiers, please email me privately).

 

Here is my macro approach:

 

%macro IsNumM(char,type=NUM,missing=N);
   %let type=%upcase(&type);
   %let missing=%upcase(&missing);

   %* remove leading spaces ;
   %let char=&char;
   %* put CHAR=*&char*;  %* for debugging ;

   %* check if empty string or missing data ;
   %* if YES: ;
   %* if MISSING=N treat as not valid numeric data ;
   %* if MISSING=Y treat as valid numeric data ;
   %let rx = %sysfunc(prxparse(/^( *|\.[A-Z_]*)$/o));
   %let match = %sysfunc(prxmatch(&rx,%superq(char)));
   %syscall prxfree(rx);
   %if (&match eq 1) %then %do;
      %if (&missing) %then %let rtn=1; %else %let rtn=0;
&rtn
      %return;
   %end;

   %* use a Perl regex to test for numeric input ;
   %* the regex is "/^(-{0,1})(\d*)(\.{0,1})(\d*)$/o", which means: ;
   %* beginning of string (^): ;
   %* 0 or 1 minus signs: ;
   %* 0 or more digits: ;
   %* 0 or 1 periods: ;
   %* 0 or more digits: ;
   %* end of string ($) ;
   %* compile the regex once (o) ;
   %let rx = %sysfunc(prxparse(/^(-{0,1})(\d*)(\.{0,1})(\d*)$/o));
   %let match = %sysfunc(prxmatch(&rx,%superq(char)));
   %syscall prxfree(rx);

   %* if no match then not num ;
   %if (&match eq 0) %then %do;
0
      %return;
   %end;

   %* the Perl regex should be sufficient to cleanse the input to the inputn function ;   
   %* convert the value to num. if it is missing then not num ;
   %* note: this only supports options missing='.' and options missing=' ' ;
   %let num=%sysfunc(inputn(%superq(char),best32.),best32.);
   %let num=&num;
   %* put NUM =#&num#; %* for debugging ;
   %if (%superq(num) eq .) or (%superq(num) eq ) %then %do;
0
      %return;
   %end;

   %* it is probably a num :-) ;
   %if (&type eq NUM) %then %do;
1 
   %end;
   %else
   %if (&type eq INT) %then %do;
      %let int=%sysfunc(int(&num),32.);
      %* put INT =#&int#; %* for debugging ;
      %let rtn=%eval(&int eq &num);
&rtn
   %end;
   %else
   %if (&type eq NONNEG) %then %do;
      %let rtn=%sysevalf(&num ge 0);
&rtn
   %end;
   %else
   %if (&type eq POS) %then %do;
      %let rtn=%sysevalf(&num gt 0);
&rtn
   %end;
%mend;

Finally, I've macro-ized both approaches and have uploaded them to my GitHub repository:

 

https://github.com/scottbass/SAS/blob/master/Macro/IsNum.sas
https://github.com/scottbass/SAS/blob/master/Macro/IsNumM.sas

 

Hope this helps...


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 111326 views
  • 11 likes
  • 4 in conversation