Hi, I'm having some problem removing special characters in my dataset.
For values ">10,000(3/29), how do I just extract "10,000" for those values?
I have tried :
initial_dimer=prxchange('s/\(([^\)]+)\)//i', -1, initial_dimer),
but it just removes the whole thing and leaves it blank.
Any help would be appreciated! Thanks!
If you just have two standard patterns, namely (where 9 represents any sequence of digits).
then in the case of the first pattern you can scan for the 1st "word" starting at position 2, where "word" is a string that terminates at the separator "(". For the second pattern it's just a straight copy.
data have;
input string $20.;
datalines;
>10,000(3/29)
256
run;
data want;
set have;
if string=: '>' then x=scan(substr(string,2),1,'(');
else x=string;
put (_all_) (=);
run;
Note the
=:
comparison operator compares two strings, truncating the longer string to the length of the shorter string. So the comparison tests whether the string starts with a ">".
and here a regex that should work
data have;
input string $20.;
datalines;
>10,000(3/29)
256
run;
data want;
set have;
string=prxchange('s/^[^\d]*(\d[\d,]*).*$/$1/i', -1, strip(string));
put string=;
run;
data have;
input string $20.;
want=scan(string,1,',.','kd');
datalines;
>10,000(3/29)
256
;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.