/** Moderator's Edit **/
This is a robust solution and includes links to macro code. Other posters mention a variety of useful SAS functions and are worth a look.
OPs solution is also good -- the input function will try to convert the string to numeric and will return missing if it cannot.
Another simple solution is to add 0 to the character string -- when using the '+' operator, SAS will automatically attempt to convert character
strings to numeric where possible. If SAS cannot do this, then the result returns a missing number because '+' cannot operate on character strings. This will not, however, recognize numbers such as '1,000'
DanielSantos gives several useful functions:
anyalpha - returns index of first alphabetic character
anydigit - returns index of first numeric character
notalpha - returns index of first NON alphabetic character (includes punctuation, etc)
notdigit - returns index of first NON numeric character (includes punctuation, etc)
These functions don't give desired results when the string is something like '1.1.1', '1,000', '-500', etc.
PRXMATCH() (used as part of ScottBass's solution) is another way to accomplish this. Use the regular expression:
^ *[+-]?\d{1,3}(?:,\d{3})*\.?\d*$|^ *[+-]?\d*\.?\d*$|^ *[+-]?(?:\d+\.?\d*|\d*\.?\d+)[eE]-?\d+$
This should capture positive/negative numbers, numbers with commas, and numbers in scientific notation using e or E.
This site is a useful place to test regular expressions if you're curious: https://regex101.com/r/xV4eL5/3
/** end edit **/
I needed to do this same thing today, and found this post.
After mulling over this a bit, I've approached this problem a bit differently than the other replies. I'm replying to this old post in case someone searches for this in the future and prefers my approach.
My approach is to use the INPUT function, best32. format, and ?? INPUT function modifiers, then check if the result is missing.
We already know the input data is character (otherwise we wouldn't need this check!), so the input data is already in the data type (character) expected by the input function.
With this approach, there are some additional checks I can do.
Here is the code:
data test;
length char $32;
input char $char32.;
num = input(char,?? best32.);
IsNum = (not missing(num));
IsNum2 = (not missing(num) or strip(char) in ('','.') or (.A le num le .Z) or (num eq ._));
IsInt = (not missing(num) and int(num) = num);
IsFloat = (abs(num - int(num)) gt 0);
IsNotNeg = (num ge 0);
IsPos = (num gt 0);
test=num-int(num);
datalines;
.
._
.A
.Z
-1
0
1
1.1
123456789012.123456789012
-123456789012.123456789012
1234567890123.1234567890123
-1234567890123.1234567890123
12345678901234.12345678901234
-12345678901234.12345678901234
123456789012345.123456789012345
-123456789012345.123456789012345
1234567890123456.1234567890123456
-1234567890123456.1234567890123456
1.1
-123
-123.45
--123.45
A
B
123 456
~!@#$%^&*()_+=
;
run;
However, this will generate "Missing values were generated ..." messages in the SAS log. I like as clean a SAS log as possible. It's unfortunate that SAS doesn't support shortcircuit logic checking as done in many modern programming languages.
So, here is Version 2; less terse code but with the same results:
data test;
length char $32;
input char $char32.;
num = input(char,?? best32.);
if (not missing(num)) then do;
IsNum = 1;
IsNum2 = 1;
IsInt = (int(num) = num);
IsFloat = (abs(num - int(num)) gt 0);
IsNotNeg = (num ge 0);
IsPos = (num gt 0);
t1=num-int(num);
end;
else do;
IsNum = 0;
IsNum2 = (strip(char) in ('','.') or (.A le num le .Z) or (num = ._));
IsInt = 0;
IsFloat = 0;
IsNotNeg = 0;
IsPos = 0;
end;
datalines;
.
._
.A
.Z
-1
0
1
1.1
123456789012.123456789012
-123456789012.123456789012
1234567890123.1234567890123
-1234567890123.1234567890123
12345678901234.12345678901234
-12345678901234.12345678901234
123456789012345.123456789012345
-123456789012345.123456789012345
1234567890123456.1234567890123456
-1234567890123456.1234567890123456
1.1
-123
-123.45
--123.45
A
B
123 456
~!@#$%^&*()_+=
;
run;
Comments:
IsNum gives expected results, except for missing character input. Technically, "valid" character input which generates "expected" missing numeric data is in fact numeric input - a missing value is valid data in a SAS numeric column.
IsNum2 compensates for this by checking the value of the character input for "valid" missing or special missing input, as opposed to invalid numeric input which converts to a missing value.
The check for IsInt and IsFloat collapses for large numeric data, as the precision is lost and the difference between int(num) - num becomes 0. If there is a better way to approach this precision issue please let me know (I investigated INTZ, CEIL/Z, and FLOOR/Z functions).
I also had to do this same thing with macro data, and wanted a "pure macro" solution, or function style macro (instead of running a data step). This is complicated by the fact that %sysfunc does not support the INPUT function, but only the INPUTN (and INPUTC) function instead, and the INPUTN function does not support the ?? modifier.
(If a little birdie at SAS can explain the technical reasons why %sysfunc doesn't support the INPUT function, or why INPUTN doesn't support the ? and ?? modifiers, please email me privately).
Here is my macro approach:
%macro IsNumM(char,type=NUM,missing=N);
%let type=%upcase(&type);
%let missing=%upcase(&missing);
%* remove leading spaces ;
%let char=&char;
%* put CHAR=*&char*; %* for debugging ;
%* check if empty string or missing data ;
%* if YES: ;
%* if MISSING=N treat as not valid numeric data ;
%* if MISSING=Y treat as valid numeric data ;
%let rx = %sysfunc(prxparse(/^( *|\.[A-Z_]*)$/o));
%let match = %sysfunc(prxmatch(&rx,%superq(char)));
%syscall prxfree(rx);
%if (&match eq 1) %then %do;
%if (&missing) %then %let rtn=1; %else %let rtn=0;
&rtn
%return;
%end;
%* use a Perl regex to test for numeric input ;
%* the regex is "/^(-{0,1})(\d*)(\.{0,1})(\d*)$/o", which means: ;
%* beginning of string (^): ;
%* 0 or 1 minus signs: ;
%* 0 or more digits: ;
%* 0 or 1 periods: ;
%* 0 or more digits: ;
%* end of string ($) ;
%* compile the regex once (o) ;
%let rx = %sysfunc(prxparse(/^(-{0,1})(\d*)(\.{0,1})(\d*)$/o));
%let match = %sysfunc(prxmatch(&rx,%superq(char)));
%syscall prxfree(rx);
%* if no match then not num ;
%if (&match eq 0) %then %do;
0
%return;
%end;
%* the Perl regex should be sufficient to cleanse the input to the inputn function ;
%* convert the value to num. if it is missing then not num ;
%* note: this only supports options missing='.' and options missing=' ' ;
%let num=%sysfunc(inputn(%superq(char),best32.),best32.);
%let num=#
%* put NUM =#&num#; %* for debugging ;
%if (%superq(num) eq .) or (%superq(num) eq ) %then %do;
0
%return;
%end;
%* it is probably a num :-) ;
%if (&type eq NUM) %then %do;
1
%end;
%else
%if (&type eq INT) %then %do;
%let int=%sysfunc(int(&num),32.);
%* put INT =#&int#; %* for debugging ;
%let rtn=%eval(&int eq &num);
&rtn
%end;
%else
%if (&type eq NONNEG) %then %do;
%let rtn=%sysevalf(&num ge 0);
&rtn
%end;
%else
%if (&type eq POS) %then %do;
%let rtn=%sysevalf(&num gt 0);
&rtn
%end;
%mend;
Finally, I've macro-ized both approaches and have uploaded them to my GitHub repository:
https://github.com/scottbass/SAS/blob/master/Macro/IsNum.sas
https://github.com/scottbass/SAS/blob/master/Macro/IsNumM.sas
Hope this helps...
if you want to check that all char of your string are numeric in a simple and fast way: you can use Verify Function(variable, <list of character to find>), that returns the position of the first character in a string that is not in any of several other strings. If you list all number(see example) function give to you 0 that means the values in string are all numeric..
data _null_;
length string_to_verify $12.;
string_to_verify='1234567';
check = notdigit(string_to_verify);
check2 = verify(trim(left(string_to_verify)),'0123456789');
put string_to_verify= check= check2=;
string_to_verify ='12345678';
check = notdigit(string_to_verify);
check2 = verify(trim(left(string_to_verify)),'0123456789');
put string_to_verify= check= check2=;
string_to_verify ='1234567 9';
check = notdigit(string_to_verify);
check2 = verify(trim(left(string_to_verify)),'0123456789');
put string_to_verify= check= check2=;
string_to_verify ='123';
check = notdigit(string_to_verify);
check2 = verify(trim(left(string_to_verify)),'0123456789');
put string_to_verify= check= check2=;
string_to_verify ='A123';
check = notdigit(string_to_verify);
check2 = verify(trim(left(string_to_verify)),'0123456789');
put string_to_verify= check= check2=;
string_to_verify ='A1B23';
check = notdigit(string_to_verify);
check2 = verify(trim(left(string_to_verify)),'0123456789');
put string_to_verify= check= check2=;
string_to_verify ='1B23';
check = notdigit(string_to_verify);
check2 = verify(trim(left(string_to_verify)),'0123456789');
put string_to_verify= check= check2=;
string_to_verify ='AAA3';
check = notdigit(string_to_verify);
check2 = verify(trim(left(string_to_verify)),'0123456789');
put string_to_verify= check= check2=;
string_to_verify ='123C';
check = notdigit(string_to_verify);
check2 = verify(trim(left(string_to_verify)),'0123456789');
put string_to_verify= check= check2=;
run;
* LOG: ------------------------------------------------------------------;
string_to_verify=1234567 check=8 check2=0
string_to_verify=12345678 check=9 check2=0
string_to_verify=1234567 9 check=8 check2=8 /*blank char is not numeric!*/
string_to_verify=123 check=4 check2=0
string_to_verify=A123 check=1 check2=1
string_to_verify=A1B23 check=1 check2=1
string_to_verify=1B23 check=2 check2=2
string_to_verify=AAA3 check=1 check2=1
string_to_verify=123C check=4 check2=4
here there's a beautyfull pdf and there's another on 'Charachter function' with a lot of example (ex. anydigit, etc..).
bye
Note this approved solution does not cater for valid negative numeric data.
/** Moderator's Edit **/
This is a robust solution and includes links to macro code. Other posters mention a variety of useful SAS functions and are worth a look.
OPs solution is also good -- the input function will try to convert the string to numeric and will return missing if it cannot.
Another simple solution is to add 0 to the character string -- when using the '+' operator, SAS will automatically attempt to convert character
strings to numeric where possible. If SAS cannot do this, then the result returns a missing number because '+' cannot operate on character strings. This will not, however, recognize numbers such as '1,000'
DanielSantos gives several useful functions:
anyalpha - returns index of first alphabetic character
anydigit - returns index of first numeric character
notalpha - returns index of first NON alphabetic character (includes punctuation, etc)
notdigit - returns index of first NON numeric character (includes punctuation, etc)
These functions don't give desired results when the string is something like '1.1.1', '1,000', '-500', etc.
PRXMATCH() (used as part of ScottBass's solution) is another way to accomplish this. Use the regular expression:
^ *[+-]?\d{1,3}(?:,\d{3})*\.?\d*$|^ *[+-]?\d*\.?\d*$|^ *[+-]?(?:\d+\.?\d*|\d*\.?\d+)[eE]-?\d+$
This should capture positive/negative numbers, numbers with commas, and numbers in scientific notation using e or E.
This site is a useful place to test regular expressions if you're curious: https://regex101.com/r/xV4eL5/3
/** end edit **/
I needed to do this same thing today, and found this post.
After mulling over this a bit, I've approached this problem a bit differently than the other replies. I'm replying to this old post in case someone searches for this in the future and prefers my approach.
My approach is to use the INPUT function, best32. format, and ?? INPUT function modifiers, then check if the result is missing.
We already know the input data is character (otherwise we wouldn't need this check!), so the input data is already in the data type (character) expected by the input function.
With this approach, there are some additional checks I can do.
Here is the code:
data test;
length char $32;
input char $char32.;
num = input(char,?? best32.);
IsNum = (not missing(num));
IsNum2 = (not missing(num) or strip(char) in ('','.') or (.A le num le .Z) or (num eq ._));
IsInt = (not missing(num) and int(num) = num);
IsFloat = (abs(num - int(num)) gt 0);
IsNotNeg = (num ge 0);
IsPos = (num gt 0);
test=num-int(num);
datalines;
.
._
.A
.Z
-1
0
1
1.1
123456789012.123456789012
-123456789012.123456789012
1234567890123.1234567890123
-1234567890123.1234567890123
12345678901234.12345678901234
-12345678901234.12345678901234
123456789012345.123456789012345
-123456789012345.123456789012345
1234567890123456.1234567890123456
-1234567890123456.1234567890123456
1.1
-123
-123.45
--123.45
A
B
123 456
~!@#$%^&*()_+=
;
run;
However, this will generate "Missing values were generated ..." messages in the SAS log. I like as clean a SAS log as possible. It's unfortunate that SAS doesn't support shortcircuit logic checking as done in many modern programming languages.
So, here is Version 2; less terse code but with the same results:
data test;
length char $32;
input char $char32.;
num = input(char,?? best32.);
if (not missing(num)) then do;
IsNum = 1;
IsNum2 = 1;
IsInt = (int(num) = num);
IsFloat = (abs(num - int(num)) gt 0);
IsNotNeg = (num ge 0);
IsPos = (num gt 0);
t1=num-int(num);
end;
else do;
IsNum = 0;
IsNum2 = (strip(char) in ('','.') or (.A le num le .Z) or (num = ._));
IsInt = 0;
IsFloat = 0;
IsNotNeg = 0;
IsPos = 0;
end;
datalines;
.
._
.A
.Z
-1
0
1
1.1
123456789012.123456789012
-123456789012.123456789012
1234567890123.1234567890123
-1234567890123.1234567890123
12345678901234.12345678901234
-12345678901234.12345678901234
123456789012345.123456789012345
-123456789012345.123456789012345
1234567890123456.1234567890123456
-1234567890123456.1234567890123456
1.1
-123
-123.45
--123.45
A
B
123 456
~!@#$%^&*()_+=
;
run;
Comments:
IsNum gives expected results, except for missing character input. Technically, "valid" character input which generates "expected" missing numeric data is in fact numeric input - a missing value is valid data in a SAS numeric column.
IsNum2 compensates for this by checking the value of the character input for "valid" missing or special missing input, as opposed to invalid numeric input which converts to a missing value.
The check for IsInt and IsFloat collapses for large numeric data, as the precision is lost and the difference between int(num) - num becomes 0. If there is a better way to approach this precision issue please let me know (I investigated INTZ, CEIL/Z, and FLOOR/Z functions).
I also had to do this same thing with macro data, and wanted a "pure macro" solution, or function style macro (instead of running a data step). This is complicated by the fact that %sysfunc does not support the INPUT function, but only the INPUTN (and INPUTC) function instead, and the INPUTN function does not support the ?? modifier.
(If a little birdie at SAS can explain the technical reasons why %sysfunc doesn't support the INPUT function, or why INPUTN doesn't support the ? and ?? modifiers, please email me privately).
Here is my macro approach:
%macro IsNumM(char,type=NUM,missing=N);
%let type=%upcase(&type);
%let missing=%upcase(&missing);
%* remove leading spaces ;
%let char=&char;
%* put CHAR=*&char*; %* for debugging ;
%* check if empty string or missing data ;
%* if YES: ;
%* if MISSING=N treat as not valid numeric data ;
%* if MISSING=Y treat as valid numeric data ;
%let rx = %sysfunc(prxparse(/^( *|\.[A-Z_]*)$/o));
%let match = %sysfunc(prxmatch(&rx,%superq(char)));
%syscall prxfree(rx);
%if (&match eq 1) %then %do;
%if (&missing) %then %let rtn=1; %else %let rtn=0;
&rtn
%return;
%end;
%* use a Perl regex to test for numeric input ;
%* the regex is "/^(-{0,1})(\d*)(\.{0,1})(\d*)$/o", which means: ;
%* beginning of string (^): ;
%* 0 or 1 minus signs: ;
%* 0 or more digits: ;
%* 0 or 1 periods: ;
%* 0 or more digits: ;
%* end of string ($) ;
%* compile the regex once (o) ;
%let rx = %sysfunc(prxparse(/^(-{0,1})(\d*)(\.{0,1})(\d*)$/o));
%let match = %sysfunc(prxmatch(&rx,%superq(char)));
%syscall prxfree(rx);
%* if no match then not num ;
%if (&match eq 0) %then %do;
0
%return;
%end;
%* the Perl regex should be sufficient to cleanse the input to the inputn function ;
%* convert the value to num. if it is missing then not num ;
%* note: this only supports options missing='.' and options missing=' ' ;
%let num=%sysfunc(inputn(%superq(char),best32.),best32.);
%let num=#
%* put NUM =#&num#; %* for debugging ;
%if (%superq(num) eq .) or (%superq(num) eq ) %then %do;
0
%return;
%end;
%* it is probably a num :-) ;
%if (&type eq NUM) %then %do;
1
%end;
%else
%if (&type eq INT) %then %do;
%let int=%sysfunc(int(&num),32.);
%* put INT =#&int#; %* for debugging ;
%let rtn=%eval(&int eq &num);
&rtn
%end;
%else
%if (&type eq NONNEG) %then %do;
%let rtn=%sysevalf(&num ge 0);
&rtn
%end;
%else
%if (&type eq POS) %then %do;
%let rtn=%sysevalf(&num gt 0);
&rtn
%end;
%mend;
Finally, I've macro-ized both approaches and have uploaded them to my GitHub repository:
https://github.com/scottbass/SAS/blob/master/Macro/IsNum.sas
https://github.com/scottbass/SAS/blob/master/Macro/IsNumM.sas
Hope this helps...
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.