Hi, I'm having some problem removing special characters in my dataset.
For values ">10,000(3/29), how do I just extract "10,000" for those values?
I have tried :
initial_dimer=prxchange('s/\(([^\)]+)\)//i', -1, initial_dimer),
but it just removes the whole thing and leaves it blank.
Any help would be appreciated! Thanks!
If you just have two standard patterns, namely (where 9 represents any sequence of digits).
then in the case of the first pattern you can scan for the 1st "word" starting at position 2, where "word" is a string that terminates at the separator "(". For the second pattern it's just a straight copy.
data have;
input string $20.;
datalines;
>10,000(3/29)
256
run;
data want;
set have;
if string=: '>' then x=scan(substr(string,2),1,'(');
else x=string;
put (_all_) (=);
run;
Note the
=:
comparison operator compares two strings, truncating the longer string to the length of the shorter string. So the comparison tests whether the string starts with a ">".
and here a regex that should work
data have;
input string $20.;
datalines;
>10,000(3/29)
256
run;
data want;
set have;
string=prxchange('s/^[^\d]*(\d[\d,]*).*$/$1/i', -1, strip(string));
put string=;
run;
data have;
input string $20.;
want=scan(string,1,',.','kd');
datalines;
>10,000(3/29)
256
;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.