BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
FriendJen
Fluorite | Level 6

Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like

if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);

The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.

RESULT           NUM            CHAR

'Negative'                             'Negative'

'124.3'               124.3

'<50'                  <50

'Other'                                   'Other'

'<99.9'               <99.9

 

I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.

Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@FriendJen wrote:

Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like

if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);

The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.

RESULT           NUM            CHAR

'Negative'                             'Negative'

'124.3'               124.3

'<50'                  <50

'Other'                                   'Other'

'<99.9'               <99.9

 

I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.

Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?

 


I would say no.  But the list of valid characters for strings that look like numerical expressions is pretty limited.

Basic operators

+ - / *

Comparison operators

< = >

Punctuation 

. , $ %

Example:

data have ;
  input result $80. ;
cards;
Negative
124.3
<50
Other
<99.9
50%
;


data want;
  set have;
  if verify(result,'0123456789.,+-/* <=>$%') then char=result;
  else do;
     num_string=result;
     num=input(compress(num_string,'<=>'),??comma32.);
  end;
run;

Result

                                num_
Obs    result      char        string     num

 1     Negative    Negative                 .
 2     124.3                   124.3     124.3
 3     <50                     <50        50.0
 4     Other       Other                    .
 5     <99.9                   <99.9      99.9
 6     50%                     50%        50.0

View solution in original post

2 REPLIES 2
Tom
Super User Tom
Super User

@FriendJen wrote:

Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like

if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);

The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.

RESULT           NUM            CHAR

'Negative'                             'Negative'

'124.3'               124.3

'<50'                  <50

'Other'                                   'Other'

'<99.9'               <99.9

 

I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.

Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?

 


I would say no.  But the list of valid characters for strings that look like numerical expressions is pretty limited.

Basic operators

+ - / *

Comparison operators

< = >

Punctuation 

. , $ %

Example:

data have ;
  input result $80. ;
cards;
Negative
124.3
<50
Other
<99.9
50%
;


data want;
  set have;
  if verify(result,'0123456789.,+-/* <=>$%') then char=result;
  else do;
     num_string=result;
     num=input(compress(num_string,'<=>'),??comma32.);
  end;
run;

Result

                                num_
Obs    result      char        string     num

 1     Negative    Negative                 .
 2     124.3                   124.3     124.3
 3     <50                     <50        50.0
 4     Other       Other                    .
 5     <99.9                   <99.9      99.9
 6     50%                     50%        50.0
ballardw
Super User

@FriendJen wrote:

Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like

if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);

The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.

RESULT           NUM            CHAR

'Negative'                             'Negative'

'124.3'               124.3

'<50'                  <50

'Other'                                   'Other'

'<99.9'               <99.9

 

I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.

Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?

 


Question about possible content: Do you have any values that might be expressed in scientific notation such as 1.34E6 instead of  1340000? (or even larger values or small values like 1.34E-15? )

 

The Verify code

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 740 views
  • 0 likes
  • 3 in conversation