Hi,
I have a dataset with the following data
id comment
123 ~~%abc~asd~
234 abc~def~
456 ~~ .vbr~123~567
678 ~*bvs~
The Required result I need is to remove leading and trailing ~ but need the ~ in between.
id comment
123 %abc~asd
234 abc~def
456 .vbr~123~567
678 *bvs
I am guessing I should use regex here, but not sure how, can some one help thanks.
Try next code:
do i=1 to length(var);
if substr(var,i,1) ne '~' then do;
var = substr(var,i);
leave;
end;
end;
do j=length(var) to 1 by -1;
if substr(var,i,1) ne '~' then do;
var = substr(var,1,j);
leave;
end;
end;
check the code what happens, and what do you want to do, in case var = '~~~~~~~' (only delimiters: put i= j=).
If there are no spaces in the text then:
var = tran(compbl(tran(var,' ','~')),'~',' ');
that will replace multi '~' into one only.
It's not working , there are spaces in the comments
Try next code:
do i=1 to length(var);
if substr(var,i,1) ne '~' then do;
var = substr(var,i);
leave;
end;
end;
do j=length(var) to 1 by -1;
if substr(var,i,1) ne '~' then do;
var = substr(var,1,j);
leave;
end;
end;
check the code what happens, and what do you want to do, in case var = '~~~~~~~' (only delimiters: put i= j=).
You could probably figure out how to write a regular expression pretty easily.
But you could also just figure out how to find the part of the string you want by calculating some pointers into the string.
----+---10----+---20 ~~ .vbr~123~567 ^------- first_nondlm = 3 ^------- last_dlm = 13 ^----- str_length = 16 ^---------^ substr_length = 10
Here is some code. You might need a little more logic to handle cases without trailing delimiters.
data have;
input id comment $40.;
cards;
123 ~~%abc~asd~
234 abc~def~
456 ~~ .vbr~123~567
678 ~*bvs~
999
;
data want;
set have;
first_nondlm=verify(comment,'~');
last_dlm=findc(comment,'~','b');
str_length=lengthn(comment);
substr_length=min(str_length,last_dlm) - first_nondlm;
want=substrn(comment,first_nondlm,substr_length);
run;
proc print;
run;
Result
first_ str_ substr_ Obs id comment nondlm last_dlm length length want 1 123 ~~%abc~asd~ 3 11 11 8 %abc~asd 2 234 abc~def~ 1 8 8 7 abc~def 3 456 ~~ .vbr~123~567 3 13 16 10 .vbr~123 4 678 ~*bvs~ 2 6 6 4 *bvs 5 999 1 0 0 -1
data want;
IF _N_ = 1 THEN
PATTERN = prxparse('/[^~+].+[^~+]/');
RETAIN PATTERN;
LENGTH comment_ $20 ;
INPUT Id comment $5-20;
CALL PRXSUBSTR(PATTERN,trim(comment),START,LENGTH);
comment_ = SUBSTR(comment,START,LENGTH);
KEEP Id comment comment_;
DATALINES;
123 ~~%abc~asd~
234 abc~def~
456 ~~ .vbr~123~567
678 ~*bvs~
;
run;
I wouldn't bother with regex, which I believe can be slow. Instead, consider taking advantage of the CHAR, LENGTH, and SUBSTR functions:
data have;
input id comment $20.;
put (_all_) (=);
datalines;
123 ~~%abc~asd~
234 abc~def~
456 ~~ .vbr~123~567
678 ~*bvs~
run;
data want (drop=_:);
set have;
original=comment;
do _c=1 by 1 until (char(comment,_c)^='~');;
end;
comment=substr(comment,_c);
do _c=length(comment) by -1 until(char(comment,_c)^='~');
end;
comment=substr(comment,1,_c);
run;
This leads leading blanks in the ID=456 comment. If you want to remove all leading and trailing blanks and ~'s, then consider NOT IN instead of ^=, as in:
data want (drop=_:);
set have;
original=comment;
do _c=1 by 1 until (char(comment,_c) not in ('~',' '));
end;
comment=substr(comment,_c);
do _c=length(comment) by -1 until(char(comment,_c) not in ('~',' '));
end;
comment=substr(comment,1,_c);
run;
SCAN might be very fast IMHO
data have;
input id comment $40.;
cards;
123 ~~%abc~asd~
234 abc~def~
456 ~~ .vbr~123~567
678 ~*bvs~
;
data want;
set have;
length want $50;
do _n_=1 to countw(strip(comment),'~');
want=catx('~',want,scan(strip(comment),_n_,'~'));
end;
run;
proc print noobs;run;
id | comment | want |
---|---|---|
123 | ~~%abc~asd~ | %abc~asd |
234 | abc~def~ | abc~def |
456 | ~~ .vbr~123~567 | .vbr~123~567 |
678 | ~*bvs~ | *bvs |
data have; input id comment $40.; cards; 123 ~~%abc~asd~ 234 abc~def~ 456 ~~ .vbr~123~567 678 ~*bvs~ 999 ; data want; set have; want=prxchange('s/^~+|~+$//',-1,strip(comment)); run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.