Hi, I'm having some problem removing special characters in my dataset.
For values ">10,000(3/29), how do I just extract "10,000" for those values?
I have tried :
initial_dimer=prxchange('s/\(([^\)]+)\)//i', -1, initial_dimer),
but it just removes the whole thing and leaves it blank.
Any help would be appreciated! Thanks!
If you just have two standard patterns, namely (where 9 represents any sequence of digits).
then in the case of the first pattern you can scan for the 1st "word" starting at position 2, where "word" is a string that terminates at the separator "(". For the second pattern it's just a straight copy.
data have;
input string $20.;
datalines;
>10,000(3/29)
256
run;
data want;
set have;
if string=: '>' then x=scan(substr(string,2),1,'(');
else x=string;
put (_all_) (=);
run;
Note the
=:
comparison operator compares two strings, truncating the longer string to the length of the shorter string. So the comparison tests whether the string starts with a ">".
and here a regex that should work
data have;
input string $20.;
datalines;
>10,000(3/29)
256
run;
data want;
set have;
string=prxchange('s/^[^\d]*(\d[\d,]*).*$/$1/i', -1, strip(string));
put string=;
run;
data have;
input string $20.;
want=scan(string,1,',.','kd');
datalines;
>10,000(3/29)
256
;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.