BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shanmukh2
Fluorite | Level 6

Hi,

 

I have a dataset with the following data

 

id           comment

 

123      ~~%abc~asd~

234      abc~def~

456     ~~  .vbr~123~567

678    ~*bvs~

 

The Required result I need is to remove leading and trailing ~ but need the ~ in between.

 

id           comment

 

123      %abc~asd

234      abc~def

456      .vbr~123~567

678     *bvs

 

I am guessing I should use regex here, but not sure how, can some one help thanks.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Try next code:

 

do i=1 to length(var);
    if substr(var,i,1) ne '~' then do;
       var = substr(var,i);
       leave;
    end;
end;
do j=length(var) to 1 by -1;
    if substr(var,i,1) ne '~' then do;
       var = substr(var,1,j);
       leave;
    end;
end; 

check the code what happens, and what do you want to do, in case var = '~~~~~~~' (only delimiters: put i= j=).

 

View solution in original post

9 REPLIES 9
Shmuel
Garnet | Level 18

If there are no spaces in the text then:

var = tran(compbl(tran(var,' ','~')),'~',' ');

that will replace multi '~' into one only.

 

 

shanmukh2
Fluorite | Level 6

It's not working , there are spaces in the  comments

Shmuel
Garnet | Level 18

Try next code:

 

do i=1 to length(var);
    if substr(var,i,1) ne '~' then do;
       var = substr(var,i);
       leave;
    end;
end;
do j=length(var) to 1 by -1;
    if substr(var,i,1) ne '~' then do;
       var = substr(var,1,j);
       leave;
    end;
end; 

check the code what happens, and what do you want to do, in case var = '~~~~~~~' (only delimiters: put i= j=).

 

Tom
Super User Tom
Super User

You could probably figure out how to write a regular expression pretty easily.

But you could also just figure out how to find the part of the string you want by calculating some pointers into the string.

----+---10----+---20
~~  .vbr~123~567
  ^------- first_nondlm = 3
            ^------- last_dlm = 13
               ^----- str_length = 16
  ^---------^   substr_length = 10

Here is some code. You might need a little more logic to handle cases without trailing delimiters.

data have;
 input id comment $40.;
cards;
123 ~~%abc~asd~
234 abc~def~
456 ~~  .vbr~123~567
678 ~*bvs~
999
;

data want;
  set have;
  first_nondlm=verify(comment,'~');
  last_dlm=findc(comment,'~','b');
  str_length=lengthn(comment);
  substr_length=min(str_length,last_dlm) - first_nondlm;
  want=substrn(comment,first_nondlm,substr_length);
run;
proc print;
run;

Result

                                  first_                 str_     substr_
Obs     id    comment             nondlm    last_dlm    length     length    want

 1     123    ~~%abc~asd~            3         11         11          8      %abc~asd
 2     234    abc~def~               1          8          8          7      abc~def
 3     456    ~~  .vbr~123~567       3         13         16         10        .vbr~123
 4     678    ~*bvs~                 2          6          6          4      *bvs
 5     999                           1          0          0         -1

 

shanmukh2
Fluorite | Level 6
Thank you, this solution works too.
r_behata
Barite | Level 11

data want;
	IF _N_ = 1 THEN
		
			PATTERN = prxparse('/[^~+].+[^~+]/');

		

	RETAIN PATTERN;
	LENGTH comment_ $20 ;
	INPUT Id comment $5-20;
	CALL PRXSUBSTR(PATTERN,trim(comment),START,LENGTH);


			comment_ = SUBSTR(comment,START,LENGTH);

	KEEP Id comment comment_;
	DATALINES;
123 ~~%abc~asd~
234 abc~def~
456 ~~  .vbr~123~567
678 ~*bvs~
;
run;
mkeintz
PROC Star

I wouldn't bother with regex, which I believe can be slow.  Instead, consider taking advantage of the CHAR, LENGTH, and SUBSTR functions:

 

data have;
  input id comment $20.;
  put (_all_) (=);
datalines;
123 ~~%abc~asd~
234 abc~def~
456 ~~  .vbr~123~567
678 ~*bvs~
run;


data want (drop=_:);
  set have;
  original=comment;
  do _c=1 by 1 until (char(comment,_c)^='~');;
  end;
  comment=substr(comment,_c);
  do _c=length(comment) by -1 until(char(comment,_c)^='~');
  end;
  comment=substr(comment,1,_c);
run;

This leads leading blanks in the ID=456 comment.  If you want to remove all leading and trailing blanks and ~'s, then consider NOT IN instead of ^=, as in:

data want (drop=_:);
  set have;
  original=comment;
  do _c=1 by 1 until (char(comment,_c) not in ('~',' '));
  end;
  comment=substr(comment,_c);
  do _c=length(comment) by -1 until(char(comment,_c) not in ('~',' '));
  end;
  comment=substr(comment,1,_c);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

SCAN might be very fast IMHO

 


data have;
 input id comment $40.;
cards;
123 ~~%abc~asd~
234 abc~def~
456 ~~  .vbr~123~567
678 ~*bvs~
;

data want;
set have;
length want $50;
do _n_=1 to countw(strip(comment),'~');
 want=catx('~',want,scan(strip(comment),_n_,'~'));
end;
run;

proc print noobs;run;
id comment want
123 ~~%abc~asd~ %abc~asd
234 abc~def~ abc~def
456 ~~ .vbr~123~567 .vbr~123~567
678 ~*bvs~ *bvs
Ksharp
Super User
data have;
 input id comment $40.;
cards;
123 ~~%abc~asd~
234 abc~def~
456 ~~  .vbr~123~567
678 ~*bvs~
999
;
data want;
 set have;
 want=prxchange('s/^~+|~+$//',-1,strip(comment));
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1796 views
  • 4 likes
  • 7 in conversation