BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
FriendJen
Fluorite | Level 6

Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like

if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);

The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.

RESULT           NUM            CHAR

'Negative'                             'Negative'

'124.3'               124.3

'<50'                  <50

'Other'                                   'Other'

'<99.9'               <99.9

 

I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.

Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@FriendJen wrote:

Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like

if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);

The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.

RESULT           NUM            CHAR

'Negative'                             'Negative'

'124.3'               124.3

'<50'                  <50

'Other'                                   'Other'

'<99.9'               <99.9

 

I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.

Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?

 


I would say no.  But the list of valid characters for strings that look like numerical expressions is pretty limited.

Basic operators

+ - / *

Comparison operators

< = >

Punctuation 

. , $ %

Example:

data have ;
  input result $80. ;
cards;
Negative
124.3
<50
Other
<99.9
50%
;


data want;
  set have;
  if verify(result,'0123456789.,+-/* <=>$%') then char=result;
  else do;
     num_string=result;
     num=input(compress(num_string,'<=>'),??comma32.);
  end;
run;

Result

                                num_
Obs    result      char        string     num

 1     Negative    Negative                 .
 2     124.3                   124.3     124.3
 3     <50                     <50        50.0
 4     Other       Other                    .
 5     <99.9                   <99.9      99.9
 6     50%                     50%        50.0

View solution in original post

2 REPLIES 2
Tom
Super User Tom
Super User

@FriendJen wrote:

Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like

if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);

The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.

RESULT           NUM            CHAR

'Negative'                             'Negative'

'124.3'               124.3

'<50'                  <50

'Other'                                   'Other'

'<99.9'               <99.9

 

I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.

Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?

 


I would say no.  But the list of valid characters for strings that look like numerical expressions is pretty limited.

Basic operators

+ - / *

Comparison operators

< = >

Punctuation 

. , $ %

Example:

data have ;
  input result $80. ;
cards;
Negative
124.3
<50
Other
<99.9
50%
;


data want;
  set have;
  if verify(result,'0123456789.,+-/* <=>$%') then char=result;
  else do;
     num_string=result;
     num=input(compress(num_string,'<=>'),??comma32.);
  end;
run;

Result

                                num_
Obs    result      char        string     num

 1     Negative    Negative                 .
 2     124.3                   124.3     124.3
 3     <50                     <50        50.0
 4     Other       Other                    .
 5     <99.9                   <99.9      99.9
 6     50%                     50%        50.0
ballardw
Super User

@FriendJen wrote:

Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like

if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);

The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.

RESULT           NUM            CHAR

'Negative'                             'Negative'

'124.3'               124.3

'<50'                  <50

'Other'                                   'Other'

'<99.9'               <99.9

 

I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.

Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?

 


Question about possible content: Do you have any values that might be expressed in scientific notation such as 1.34E6 instead of  1340000? (or even larger values or small values like 1.34E-15? )

 

The Verify code

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 956 views
  • 0 likes
  • 3 in conversation