Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like
if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);
The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.
RESULT NUM CHAR
'Negative' 'Negative'
'124.3' 124.3
'<50' <50
'Other' 'Other'
'<99.9' <99.9
I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.
Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?
@FriendJen wrote:
Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like
if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.
RESULT NUM CHAR
'Negative' 'Negative'
'124.3' 124.3
'<50' <50
'Other' 'Other'
'<99.9' <99.9
I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.
Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?
I would say no. But the list of valid characters for strings that look like numerical expressions is pretty limited.
Basic operators
+ - / *
Comparison operators
< = >
Punctuation
. , $ %
Example:
data have ;
input result $80. ;
cards;
Negative
124.3
<50
Other
<99.9
50%
;
data want;
set have;
if verify(result,'0123456789.,+-/* <=>$%') then char=result;
else do;
num_string=result;
num=input(compress(num_string,'<=>'),??comma32.);
end;
run;
Result
num_ Obs result char string num 1 Negative Negative . 2 124.3 124.3 124.3 3 <50 <50 50.0 4 Other Other . 5 <99.9 <99.9 99.9 6 50% 50% 50.0
@FriendJen wrote:
Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like
if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.
RESULT NUM CHAR
'Negative' 'Negative'
'124.3' 124.3
'<50' <50
'Other' 'Other'
'<99.9' <99.9
I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.
Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?
I would say no. But the list of valid characters for strings that look like numerical expressions is pretty limited.
Basic operators
+ - / *
Comparison operators
< = >
Punctuation
. , $ %
Example:
data have ;
input result $80. ;
cards;
Negative
124.3
<50
Other
<99.9
50%
;
data want;
set have;
if verify(result,'0123456789.,+-/* <=>$%') then char=result;
else do;
num_string=result;
num=input(compress(num_string,'<=>'),??comma32.);
end;
run;
Result
num_ Obs result char string num 1 Negative Negative . 2 124.3 124.3 124.3 3 <50 <50 50.0 4 Other Other . 5 <99.9 <99.9 99.9 6 50% 50% 50.0
@FriendJen wrote:
Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like
if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.
RESULT NUM CHAR
'Negative' 'Negative'
'124.3' 124.3
'<50' <50
'Other' 'Other'
'<99.9' <99.9
I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.
Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?
Question about possible content: Do you have any values that might be expressed in scientific notation such as 1.34E6 instead of 1340000? (or even larger values or small values like 1.34E-15? )
The Verify code
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.