Hi,
I have a dataset with the following data
id comment
123 ~~%abc~asd~
234 abc~def~
456 ~~ .vbr~123~567
678 ~*bvs~
The Required result I need is to remove leading and trailing ~ but need the ~ in between.
id comment
123 %abc~asd
234 abc~def
456 .vbr~123~567
678 *bvs
I am guessing I should use regex here, but not sure how, can some one help thanks.
Try next code:
do i=1 to length(var);
if substr(var,i,1) ne '~' then do;
var = substr(var,i);
leave;
end;
end;
do j=length(var) to 1 by -1;
if substr(var,i,1) ne '~' then do;
var = substr(var,1,j);
leave;
end;
end;
check the code what happens, and what do you want to do, in case var = '~~~~~~~' (only delimiters: put i= j=).
If there are no spaces in the text then:
var = tran(compbl(tran(var,' ','~')),'~',' ');
that will replace multi '~' into one only.
It's not working , there are spaces in the comments
Try next code:
do i=1 to length(var);
if substr(var,i,1) ne '~' then do;
var = substr(var,i);
leave;
end;
end;
do j=length(var) to 1 by -1;
if substr(var,i,1) ne '~' then do;
var = substr(var,1,j);
leave;
end;
end;
check the code what happens, and what do you want to do, in case var = '~~~~~~~' (only delimiters: put i= j=).
You could probably figure out how to write a regular expression pretty easily.
But you could also just figure out how to find the part of the string you want by calculating some pointers into the string.
----+---10----+---20 ~~ .vbr~123~567 ^------- first_nondlm = 3 ^------- last_dlm = 13 ^----- str_length = 16 ^---------^ substr_length = 10
Here is some code. You might need a little more logic to handle cases without trailing delimiters.
data have;
input id comment $40.;
cards;
123 ~~%abc~asd~
234 abc~def~
456 ~~ .vbr~123~567
678 ~*bvs~
999
;
data want;
set have;
first_nondlm=verify(comment,'~');
last_dlm=findc(comment,'~','b');
str_length=lengthn(comment);
substr_length=min(str_length,last_dlm) - first_nondlm;
want=substrn(comment,first_nondlm,substr_length);
run;
proc print;
run;
Result
first_ str_ substr_ Obs id comment nondlm last_dlm length length want 1 123 ~~%abc~asd~ 3 11 11 8 %abc~asd 2 234 abc~def~ 1 8 8 7 abc~def 3 456 ~~ .vbr~123~567 3 13 16 10 .vbr~123 4 678 ~*bvs~ 2 6 6 4 *bvs 5 999 1 0 0 -1
data want;
IF _N_ = 1 THEN
PATTERN = prxparse('/[^~+].+[^~+]/');
RETAIN PATTERN;
LENGTH comment_ $20 ;
INPUT Id comment $5-20;
CALL PRXSUBSTR(PATTERN,trim(comment),START,LENGTH);
comment_ = SUBSTR(comment,START,LENGTH);
KEEP Id comment comment_;
DATALINES;
123 ~~%abc~asd~
234 abc~def~
456 ~~ .vbr~123~567
678 ~*bvs~
;
run;
I wouldn't bother with regex, which I believe can be slow. Instead, consider taking advantage of the CHAR, LENGTH, and SUBSTR functions:
data have;
input id comment $20.;
put (_all_) (=);
datalines;
123 ~~%abc~asd~
234 abc~def~
456 ~~ .vbr~123~567
678 ~*bvs~
run;
data want (drop=_:);
set have;
original=comment;
do _c=1 by 1 until (char(comment,_c)^='~');;
end;
comment=substr(comment,_c);
do _c=length(comment) by -1 until(char(comment,_c)^='~');
end;
comment=substr(comment,1,_c);
run;
This leads leading blanks in the ID=456 comment. If you want to remove all leading and trailing blanks and ~'s, then consider NOT IN instead of ^=, as in:
data want (drop=_:);
set have;
original=comment;
do _c=1 by 1 until (char(comment,_c) not in ('~',' '));
end;
comment=substr(comment,_c);
do _c=length(comment) by -1 until(char(comment,_c) not in ('~',' '));
end;
comment=substr(comment,1,_c);
run;
SCAN might be very fast IMHO
data have;
input id comment $40.;
cards;
123 ~~%abc~asd~
234 abc~def~
456 ~~ .vbr~123~567
678 ~*bvs~
;
data want;
set have;
length want $50;
do _n_=1 to countw(strip(comment),'~');
want=catx('~',want,scan(strip(comment),_n_,'~'));
end;
run;
proc print noobs;run;
id | comment | want |
---|---|---|
123 | ~~%abc~asd~ | %abc~asd |
234 | abc~def~ | abc~def |
456 | ~~ .vbr~123~567 | .vbr~123~567 |
678 | ~*bvs~ | *bvs |
data have; input id comment $40.; cards; 123 ~~%abc~asd~ 234 abc~def~ 456 ~~ .vbr~123~567 678 ~*bvs~ 999 ; data want; set have; want=prxchange('s/^~+|~+$//',-1,strip(comment)); run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.