Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like
if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);
The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.
RESULT NUM CHAR
'Negative' 'Negative'
'124.3' 124.3
'<50' <50
'Other' 'Other'
'<99.9' <99.9
I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.
Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?
@FriendJen wrote:
Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like
if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.
RESULT NUM CHAR
'Negative' 'Negative'
'124.3' 124.3
'<50' <50
'Other' 'Other'
'<99.9' <99.9
I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.
Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?
I would say no. But the list of valid characters for strings that look like numerical expressions is pretty limited.
Basic operators
+ - / *
Comparison operators
< = >
Punctuation
. , $ %
Example:
data have ;
input result $80. ;
cards;
Negative
124.3
<50
Other
<99.9
50%
;
data want;
set have;
if verify(result,'0123456789.,+-/* <=>$%') then char=result;
else do;
num_string=result;
num=input(compress(num_string,'<=>'),??comma32.);
end;
run;
Result
num_ Obs result char string num 1 Negative Negative . 2 124.3 124.3 124.3 3 <50 <50 50.0 4 Other Other . 5 <99.9 <99.9 99.9 6 50% 50% 50.0
@FriendJen wrote:
Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like
if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.
RESULT NUM CHAR
'Negative' 'Negative'
'124.3' 124.3
'<50' <50
'Other' 'Other'
'<99.9' <99.9
I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.
Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?
I would say no. But the list of valid characters for strings that look like numerical expressions is pretty limited.
Basic operators
+ - / *
Comparison operators
< = >
Punctuation
. , $ %
Example:
data have ;
input result $80. ;
cards;
Negative
124.3
<50
Other
<99.9
50%
;
data want;
set have;
if verify(result,'0123456789.,+-/* <=>$%') then char=result;
else do;
num_string=result;
num=input(compress(num_string,'<=>'),??comma32.);
end;
run;
Result
num_ Obs result char string num 1 Negative Negative . 2 124.3 124.3 124.3 3 <50 <50 50.0 4 Other Other . 5 <99.9 <99.9 99.9 6 50% 50% 50.0
@FriendJen wrote:
Hello! I have a result variable that is a string and I'm splitting it into two variables, one for numeric data and one for character data. For example, if my original variable is RESULT, I want to split it into NUM and CHAR. Right now I'm using something like
if ANYALPHA(RESULT)>0 then CHAR=RESULT; ELSE NUM=INPUT(RESULT, BEST.);The problem happens when my RESULT variable has a non-alphanumeric character because then I can't use INPUT to change the string to numeric.
RESULT NUM CHAR
'Negative' 'Negative'
'124.3' 124.3
'<50' <50
'Other' 'Other'
'<99.9' <99.9
I know I can use COMPRESS(RESULT, '<'), but is there another way that doesn't rely on me knowing the specific character? The dataset I'm using is huge, so there could be variety I don't know about.
Is there a generalized "remove non-math" type of function I can use so I can keep decimals and negatives but remove other symbols?
Question about possible content: Do you have any values that might be expressed in scientific notation such as 1.34E6 instead of 1340000? (or even larger values or small values like 1.34E-15? )
The Verify code
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.