BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shanmukh2
Fluorite | Level 6

Hi,

 

I have a dataset with the following data

 

id           comment

 

123      ~~%abc~asd~

234      abc~def~

456     ~~  .vbr~123~567

678    ~*bvs~

 

The Required result I need is to remove leading and trailing ~ but need the ~ in between.

 

id           comment

 

123      %abc~asd

234      abc~def

456      .vbr~123~567

678     *bvs

 

I am guessing I should use regex here, but not sure how, can some one help thanks.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Try next code:

 

do i=1 to length(var);
    if substr(var,i,1) ne '~' then do;
       var = substr(var,i);
       leave;
    end;
end;
do j=length(var) to 1 by -1;
    if substr(var,i,1) ne '~' then do;
       var = substr(var,1,j);
       leave;
    end;
end; 

check the code what happens, and what do you want to do, in case var = '~~~~~~~' (only delimiters: put i= j=).

 

View solution in original post

9 REPLIES 9
Shmuel
Garnet | Level 18

If there are no spaces in the text then:

var = tran(compbl(tran(var,' ','~')),'~',' ');

that will replace multi '~' into one only.

 

 

shanmukh2
Fluorite | Level 6

It's not working , there are spaces in the  comments

Shmuel
Garnet | Level 18

Try next code:

 

do i=1 to length(var);
    if substr(var,i,1) ne '~' then do;
       var = substr(var,i);
       leave;
    end;
end;
do j=length(var) to 1 by -1;
    if substr(var,i,1) ne '~' then do;
       var = substr(var,1,j);
       leave;
    end;
end; 

check the code what happens, and what do you want to do, in case var = '~~~~~~~' (only delimiters: put i= j=).

 

Tom
Super User Tom
Super User

You could probably figure out how to write a regular expression pretty easily.

But you could also just figure out how to find the part of the string you want by calculating some pointers into the string.

----+---10----+---20
~~  .vbr~123~567
  ^------- first_nondlm = 3
            ^------- last_dlm = 13
               ^----- str_length = 16
  ^---------^   substr_length = 10

Here is some code. You might need a little more logic to handle cases without trailing delimiters.

data have;
 input id comment $40.;
cards;
123 ~~%abc~asd~
234 abc~def~
456 ~~  .vbr~123~567
678 ~*bvs~
999
;

data want;
  set have;
  first_nondlm=verify(comment,'~');
  last_dlm=findc(comment,'~','b');
  str_length=lengthn(comment);
  substr_length=min(str_length,last_dlm) - first_nondlm;
  want=substrn(comment,first_nondlm,substr_length);
run;
proc print;
run;

Result

                                  first_                 str_     substr_
Obs     id    comment             nondlm    last_dlm    length     length    want

 1     123    ~~%abc~asd~            3         11         11          8      %abc~asd
 2     234    abc~def~               1          8          8          7      abc~def
 3     456    ~~  .vbr~123~567       3         13         16         10        .vbr~123
 4     678    ~*bvs~                 2          6          6          4      *bvs
 5     999                           1          0          0         -1

 

shanmukh2
Fluorite | Level 6
Thank you, this solution works too.
r_behata
Barite | Level 11

data want;
	IF _N_ = 1 THEN
		
			PATTERN = prxparse('/[^~+].+[^~+]/');

		

	RETAIN PATTERN;
	LENGTH comment_ $20 ;
	INPUT Id comment $5-20;
	CALL PRXSUBSTR(PATTERN,trim(comment),START,LENGTH);


			comment_ = SUBSTR(comment,START,LENGTH);

	KEEP Id comment comment_;
	DATALINES;
123 ~~%abc~asd~
234 abc~def~
456 ~~  .vbr~123~567
678 ~*bvs~
;
run;
mkeintz
PROC Star

I wouldn't bother with regex, which I believe can be slow.  Instead, consider taking advantage of the CHAR, LENGTH, and SUBSTR functions:

 

data have;
  input id comment $20.;
  put (_all_) (=);
datalines;
123 ~~%abc~asd~
234 abc~def~
456 ~~  .vbr~123~567
678 ~*bvs~
run;


data want (drop=_:);
  set have;
  original=comment;
  do _c=1 by 1 until (char(comment,_c)^='~');;
  end;
  comment=substr(comment,_c);
  do _c=length(comment) by -1 until(char(comment,_c)^='~');
  end;
  comment=substr(comment,1,_c);
run;

This leads leading blanks in the ID=456 comment.  If you want to remove all leading and trailing blanks and ~'s, then consider NOT IN instead of ^=, as in:

data want (drop=_:);
  set have;
  original=comment;
  do _c=1 by 1 until (char(comment,_c) not in ('~',' '));
  end;
  comment=substr(comment,_c);
  do _c=length(comment) by -1 until(char(comment,_c) not in ('~',' '));
  end;
  comment=substr(comment,1,_c);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

SCAN might be very fast IMHO

 


data have;
 input id comment $40.;
cards;
123 ~~%abc~asd~
234 abc~def~
456 ~~  .vbr~123~567
678 ~*bvs~
;

data want;
set have;
length want $50;
do _n_=1 to countw(strip(comment),'~');
 want=catx('~',want,scan(strip(comment),_n_,'~'));
end;
run;

proc print noobs;run;
id comment want
123 ~~%abc~asd~ %abc~asd
234 abc~def~ abc~def
456 ~~ .vbr~123~567 .vbr~123~567
678 ~*bvs~ *bvs
Ksharp
Super User
data have;
 input id comment $40.;
cards;
123 ~~%abc~asd~
234 abc~def~
456 ~~  .vbr~123~567
678 ~*bvs~
999
;
data want;
 set have;
 want=prxchange('s/^~+|~+$//',-1,strip(comment));
run;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3096 views
  • 4 likes
  • 7 in conversation