Re: Removing special character from a string

stevenyan0127 · Posted 09-07-2022 04:24 PM

Hi, I'm having some problem removing special characters in my dataset.

For values ">10,000(3/29), how do I just extract "10,000" for those values?

I have tried :

initial_dimer=prxchange('s/\(([^\)]+)\)//i', -1, initial_dimer),

but it just removes the whole thing and leaves it blank.

Any help would be appreciated! Thanks!

mkeintz · Posted 09-07-2022 04:36 PM

If you just have two standard patterns, namely (where 9 represents any sequence of digits).

>99,999(xxxx)
and
999

then in the case of the first pattern you can scan for the 1st "word" starting at position 2, where "word" is a string that terminates at the separator "(". For the second pattern it's just a straight copy.

data have;
  input string $20.;
datalines;
>10,000(3/29)
256
run;

data want;
  set have;
  if string=: '>' then x=scan(substr(string,2),1,'(');
  else x=string;
  put (_all_) (=);
run;

Note the

=:

comparison operator compares two strings, truncating the longer string to the length of the shorter string. So the comparison tests whether the string starts with a ">".

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Patrick · Posted 09-08-2022 01:52 AM

and here a regex that should work

data have;
  input string $20.;
datalines;
>10,000(3/29)
256
run;

data want;
  set have;
  string=prxchange('s/^[^\d]*(\d[\d,]*).*$/$1/i', -1, strip(string));
  put string=;
run;

Ksharp · Posted 09-08-2022 07:10 AM

data have;
  input string $20.;
want=scan(string,1,',.','kd');
datalines;
>10,000(3/29)
256
;
run;

Removing special character from a string