SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
FriendJen
Fluorite | Level 6

Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like

if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);

The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.

RESULT           NUM            CHAR

'Negative'                             'Negative'

'124.3'               124.3

'<50'                  <50

'Other'                                   'Other'

'<99.9'               <99.9

 

I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.

Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@FriendJen wrote:

Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like

if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);

The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.

RESULT           NUM            CHAR

'Negative'                             'Negative'

'124.3'               124.3

'<50'                  <50

'Other'                                   'Other'

'<99.9'               <99.9

 

I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.

Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?

 


I would say no.  But the list of valid characters for strings that look like numerical expressions is pretty limited.

Basic operators

+ - / *

Comparison operators

< = >

Punctuation 

. , $ %

Example:

data have ;
  input result $80. ;
cards;
Negative
124.3
<50
Other
<99.9
50%
;


data want;
  set have;
  if verify(result,'0123456789.,+-/* <=>$%') then char=result;
  else do;
     num_string=result;
     num=input(compress(num_string,'<=>'),??comma32.);
  end;
run;

Result

                                num_
Obs    result      char        string     num

 1     Negative    Negative                 .
 2     124.3                   124.3     124.3
 3     <50                     <50        50.0
 4     Other       Other                    .
 5     <99.9                   <99.9      99.9
 6     50%                     50%        50.0

View solution in original post

2 REPLIES 2
Tom
Super User Tom
Super User

@FriendJen wrote:

Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like

if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);

The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.

RESULT           NUM            CHAR

'Negative'                             'Negative'

'124.3'               124.3

'<50'                  <50

'Other'                                   'Other'

'<99.9'               <99.9

 

I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.

Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?

 


I would say no.  But the list of valid characters for strings that look like numerical expressions is pretty limited.

Basic operators

+ - / *

Comparison operators

< = >

Punctuation 

. , $ %

Example:

data have ;
  input result $80. ;
cards;
Negative
124.3
<50
Other
<99.9
50%
;


data want;
  set have;
  if verify(result,'0123456789.,+-/* <=>$%') then char=result;
  else do;
     num_string=result;
     num=input(compress(num_string,'<=>'),??comma32.);
  end;
run;

Result

                                num_
Obs    result      char        string     num

 1     Negative    Negative                 .
 2     124.3                   124.3     124.3
 3     <50                     <50        50.0
 4     Other       Other                    .
 5     <99.9                   <99.9      99.9
 6     50%                     50%        50.0
ballardw
Super User

@FriendJen wrote:

Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like

if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);

The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.

RESULT           NUM            CHAR

'Negative'                             'Negative'

'124.3'               124.3

'<50'                  <50

'Other'                                   'Other'

'<99.9'               <99.9

 

I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.

Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?

 


Question about possible content: Do you have any values that might be expressed in scientific notation such as 1.34E6 instead of  1340000? (or even larger values or small values like 1.34E-15? )

 

The Verify code

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1250 views
  • 0 likes
  • 3 in conversation