Help using Base SAS procedures

Check numeric values in alphanumeric variables

Accepted Solution Solved
Reply
Contributor
Posts: 58
Accepted Solution

Check numeric values in alphanumeric variables

Hi to all,
in T-SQL there is a simple function (called isnumeric()) which checks if the value inside a column is numeric or not.
I couldn't find anything similar in SAS functions (to be used in datasteps/proc sql)...the first (and at the moment, working) solution I've found is to check with the value returned from an input:

proc sql;
select *
from work.test a
where input(a.charValue,best.) = .;
quit;

I wanted to know...is there a better way? Is the method I've adopted correct?
Thanks

Daniele Message was edited by: Daniele Tiles

Accepted Solutions
Solution
‎09-29-2016 01:42 PM
SAS Employee
Posts: 1

Re: Check numeric values in alphanumeric variables

An easy way to catch all numbers including floating point values is to use a combination of the LENGTHN and COMPRESS functions.   The variable FLAG will be set to 1 if the value is numeric, otherwise FLAG will be set to 0. 

 

Below is sample code that illustrates the sytnax:

 

data one;
    input Val $;
    len=lengthn(val);
    newval=compress(val,".",'d');
    len2=lengthn(newval);
    if len2 ne 0  then flag=0;
       else flag=1;
 datalines;
 abc.400
 300.bef
 3456321
 abcdefg
 123.234
 ;

 proc print;
 run;

View solution in original post


All Replies
Super Contributor
Posts: 474

Re: Check numeric values in alphanumeric variables

Hello Daniele

There is a "neater" way of doing that.
There is actually a couple of functions for checking the contents of a char variable for alpha or num chars.

Check:
ANYALPHA: http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a002194060.htm
ANYDIGIT: http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a002194193.htm
NOTALPHA: http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a002194058.htm
NOTDIGIT: http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a002194195.htm

Greetings from Portugal.

Daniel Santos at www.cgd.pt
Contributor
Posts: 58

Re: Check numeric values in alphanumeric variables

Hi Daniel,
thanks for the suggestion! However, I've tested the method that seemed to fit best my issues (NOTDIGIT)...but I've found two remarks (one solved, the other one...):
-- NOTDIGIT has to consider also blanks, so it has to be called always as NOTDIGIT(TRIM(LEFT(<<STRING>>)))...but that's ok
-- The NOTDIGIT might be useful for recognizing integers, but not float/double (for example, if I have to check that a SALES field has got not numeric values....the dot isn't (rightly) recognized as digit).
Super Contributor
Posts: 474

Re: Check numeric values in alphanumeric variables

Hello Danielle.

OK, if your dealing with blanks and floats try the ANYALPHA function instead the NOTDIGIT.

data _null_;
if anyalpha(' 2 ') then put 'ALPHA'; else put 'DIGIT';
if anyalpha('2.2') then put 'ALPHA'; else put 'DIGIT';
if anyalpha(' 2.2 ') then put 'ALPHA'; else put 'DIGIT';

/* caution, this will fail for digits with more than one . */
if anyalpha('2.2.2') then put 'ALPHA'; else put 'DIGIT';
run;

Greetings from Portugal.

Daniel Santos at www.cgd.pt
Contributor
Posts: 34

Re: Check numeric values in alphanumeric variables

if you want to check that all char of your string are numeric in a simple and fast way: you can use Verify Function(variable, <list of character to find>), that returns the position of the first character in a string that is not in any of several other strings. If you list all number(see example) function give to you 0 that means the values in string are all numeric..

 

data _null_;
length string_to_verify $12.;

    string_to_verify='1234567';
    check               = notdigit(string_to_verify);
    check2             = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='12345678';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='1234567 9';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='123';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='A123';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='A1B23';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='1B23';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='AAA3';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;

    string_to_verify ='123C';
    check                = notdigit(string_to_verify);
    check2              = verify(trim(left(string_to_verify)),'0123456789');
        put string_to_verify= check= check2=;
run;

* LOG: ------------------------------------------------------------------;

string_to_verify=1234567 check=8 check2=0
string_to_verify=12345678 check=9 check2=0
string_to_verify=1234567 9 check=8 check2=8 /*blank char is not numeric!*/
string_to_verify=123 check=4 check2=0
string_to_verify=A123 check=1 check2=1
string_to_verify=A1B23 check=1 check2=1
string_to_verify=1B23 check=2 check2=2
string_to_verify=AAA3 check=1 check2=1
string_to_verify=123C check=4 check2=4

here there's a beautyfull pdf and there's another on 'Charachter function' with a lot of example (ex. anydigit, etc..).

 

bye

Martino Crippa
Solution
‎09-29-2016 01:42 PM
SAS Employee
Posts: 1

Re: Check numeric values in alphanumeric variables

An easy way to catch all numbers including floating point values is to use a combination of the LENGTHN and COMPRESS functions.   The variable FLAG will be set to 1 if the value is numeric, otherwise FLAG will be set to 0. 

 

Below is sample code that illustrates the sytnax:

 

data one;
    input Val $;
    len=lengthn(val);
    newval=compress(val,".",'d');
    len2=lengthn(newval);
    if len2 ne 0  then flag=0;
       else flag=1;
 datalines;
 abc.400
 300.bef
 3456321
 abcdefg
 123.234
 ;

 proc print;
 run;
Super Contributor
Posts: 377

Re: Check numeric values in alphanumeric variables

Note this approved solution does not cater for valid negative numeric data.

Super Contributor
Posts: 377

Re: Check numeric values in alphanumeric variables

I needed to do this same thing today, and found this post.

 

After mulling over this a bit, I've approached this problem a bit differently than the other replies. I'm replying to this old post in case someone searches for this in the future and prefers my approach.

 

My approach is to use the INPUT function, best32. format, and ?? INPUT function modifiers, then check if the result is missing.

 

We already know the input data is character (otherwise we wouldn't need this check!), so the input data is already in the data type (character) expected by the input function.

 

With this approach, there are some additional checks I can do.

 

Here is the code:

 

data test;
   length char $32;
   input char $char32.;
   num      = input(char,?? best32.);
   IsNum    = (not missing(num));
   IsNum2   = (not missing(num) or strip(char) in ('','.') or (.A le num le .Z) or (num eq ._));
   IsInt    = (not missing(num) and int(num) = num);
   IsFloat  = (abs(num - int(num)) gt 0);
   IsNotNeg = (num ge 0);
   IsPos    = (num gt 0);
   test=num-int(num);
   datalines;

.
._
.A
.Z
-1
0
1
1.1
 123456789012.123456789012
-123456789012.123456789012
 1234567890123.1234567890123
-1234567890123.1234567890123
 12345678901234.12345678901234
-12345678901234.12345678901234
 123456789012345.123456789012345
-123456789012345.123456789012345
 1234567890123456.1234567890123456
-1234567890123456.1234567890123456
 1.1
 -123
 -123.45
 --123.45
A
 B
123 456
~!@#$%^&*()_+=
;
run;

However, this will generate "Missing values were generated ..." messages in the SAS log. I like as clean a SAS log as possible. It's unfortunate that SAS doesn't support shortcircuit logic checking as done in many modern programming languages.

 

So, here is Version 2; less terse code but with the same results:

 

data test;
   length char $32;
   input char $char32.;
   num = input(char,?? best32.);
   if (not missing(num)) then do;
      IsNum    = 1;
      IsNum2   = 1;
      IsInt    = (int(num) = num);
      IsFloat  = (abs(num - int(num)) gt 0);
      IsNotNeg = (num ge 0);
      IsPos    = (num gt 0);
      t1=num-int(num);
   end;
   else do;
      IsNum    = 0;
      IsNum2   = (strip(char) in ('','.') or (.A le num le .Z) or (num = ._));
      IsInt    = 0;
      IsFloat  = 0;
      IsNotNeg = 0;
      IsPos    = 0;
   end;
   datalines;

.
._
.A
.Z
-1
0
1
1.1
 123456789012.123456789012
-123456789012.123456789012
 1234567890123.1234567890123
-1234567890123.1234567890123
 12345678901234.12345678901234
-12345678901234.12345678901234
 123456789012345.123456789012345
-123456789012345.123456789012345
 1234567890123456.1234567890123456
-1234567890123456.1234567890123456
 1.1
 -123
 -123.45
 --123.45
A
 B
123 456
~!@#$%^&*()_+=
;
run;

Comments:

 

IsNum gives expected results, except for missing character input. Technically, "valid" character input which generates "expected" missing numeric data is in fact numeric input - a missing value is valid data in a SAS numeric column.

 

IsNum2 compensates for this by checking the value of the character input for "valid" missing or special missing input, as opposed to invalid numeric input which converts to a missing value.

 

The check for IsInt and IsFloat collapses for large numeric data, as the precision is lost and the difference between int(num) - num becomes 0. If there is a better way to approach this precision issue please let me know (I investigated INTZ, CEIL/Z, and FLOOR/Z functions).

 

I also had to do this same thing with macro data, and wanted a "pure macro" solution, or function style macro (instead of running a data step). This is complicated by the fact that %sysfunc does not support the INPUT function, but only the INPUTN (and INPUTC) function instead, and the INPUTN function does not support the ?? modifier.

 

(If a little birdie at SAS can explain the technical reasons why %sysfunc doesn't support the INPUT function, or why INPUTN doesn't support the ? and ?? modifiers, please email me privately).

 

Here is my macro approach:

 

%macro IsNumM(char,type=NUM,missing=N);
   %let type=%upcase(&type);
   %let missing=%upcase(&missing);

   %* remove leading spaces ;
   %let char=&char;
   %* put CHAR=*&char*;  %* for debugging ;

   %* check if empty string or missing data ;
   %* if YES: ;
   %* if MISSING=N treat as not valid numeric data ;
   %* if MISSING=Y treat as valid numeric data ;
   %let rx = %sysfunc(prxparse(/^( *|\.[A-Z_]*)$/o));
   %let match = %sysfunc(prxmatch(&rx,%superq(char)));
   %syscall prxfree(rx);
   %if (&match eq 1) %then %do;
      %if (&missing) %then %let rtn=1; %else %let rtn=0;
&rtn
      %return;
   %end;

   %* use a Perl regex to test for numeric input ;
   %* the regex is "/^(-{0,1})(\d*)(\.{0,1})(\d*)$/o", which means: ;
   %* beginning of string (^): ;
   %* 0 or 1 minus signs: ;
   %* 0 or more digits: ;
   %* 0 or 1 periods: ;
   %* 0 or more digits: ;
   %* end of string ($) ;
   %* compile the regex once (o) ;
   %let rx = %sysfunc(prxparse(/^(-{0,1})(\d*)(\.{0,1})(\d*)$/o));
   %let match = %sysfunc(prxmatch(&rx,%superq(char)));
   %syscall prxfree(rx);

   %* if no match then not num ;
   %if (&match eq 0) %then %do;
0
      %return;
   %end;

   %* the Perl regex should be sufficient to cleanse the input to the inputn function ;   
   %* convert the value to num. if it is missing then not num ;
   %* note: this only supports options missing='.' and options missing=' ' ;
   %let num=%sysfunc(inputn(%superq(char),best32.),best32.);
   %let num=&num;
   %* put NUM =#&num#; %* for debugging ;
   %if (%superq(num) eq .) or (%superq(num) eq ) %then %do;
0
      %return;
   %end;

   %* it is probably a num :-) ;
   %if (&type eq NUM) %then %do;
1 
   %end;
   %else
   %if (&type eq INT) %then %do;
      %let int=%sysfunc(int(&num),32.);
      %* put INT =#&int#; %* for debugging ;
      %let rtn=%eval(&int eq &num);
&rtn
   %end;
   %else
   %if (&type eq NONNEG) %then %do;
      %let rtn=%sysevalf(&num ge 0);
&rtn
   %end;
   %else
   %if (&type eq POS) %then %do;
      %let rtn=%sysevalf(&num gt 0);
&rtn
   %end;
%mend;

Finally, I've macro-ized both approaches and have uploaded them to my GitHub repository:

 

https://github.com/scottbass/SAS/blob/master/Macro/IsNum.sas
https://github.com/scottbass/SAS/blob/master/Macro/IsNumM.sas

 

Hope this helps...

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 36245 views
  • 3 likes
  • 5 in conversation