Solved: Re: Remove all text before 2nd (or 3rd) ('_') & all text after the 1st...

Zeus_Olympus · Posted 06-23-2022 10:16 AM

Dear all,

I have text string values like:

TR_U8_FORTE_BDERT_GT_RPJ

GD_FT_LAMAN_TRITE_SRTI_GT_RPJ

VB_7P_VELON_SUTI_RPJ

and I need :

FORTE_BDERT_GT

LAMAN_TRITE_SRTI_GT

VELON_SUTI

I have tried SCAN function without success.

I would appreciate your advice.

Thank you.

Ksharp · Posted 06-24-2022 08:09 AM

data have;
   input str $50.;
datalines;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
;

data want;
 set have;
want=prxchange('s/^[^_]+_[^_]+_|_[^_]+$//',-1,strip(str));
run;

View solution in original post

PeterClemmensen · Posted 06-23-2022 10:17 AM

What is the logic here?

The DATA to DATA Step Macro
Blog: SASnrd

CarmineVerrell · Posted 06-23-2022 10:39 AM

Here is a quick way to get the results you are looking for.

data test ;
input textstr $31.;
newtextstr=substr(textstr,1,length(textstr)-4);
newtextstr=substr(newtextstr,7);
datalines ;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
;
run;

ballardw · Posted 06-23-2022 10:48 AM

You will have to describe in terms of things that exist in the data when you want "before 2nd" and "before 3rd" removed.

This works for your example:

data have;
   input str $50.;
datalines;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
;

data want;
   set have;
   length newstr $ 50.;
   do i= 3 to (countw(str,'_') - 1);
      newstr = catx('_',newstr,scan(str,i,'_'));
   end;
   drop i;
run;

This works for "remove 2" by starting at the 3rd "word" delimited by the _ character. You would replace the 3 with 4 in the Do i=3 to remove 3 (start at 4th word). The Countw function tells us how many "words" separated by the _ there are so we stop getting before the last.

Catx inserts the _ between the words pulled using SCAN, which only treats the _ as delimiter.

Caution: if there are ever only 3 (or 4) "words" you will get an empty string result. If you have fewer than 3 you will generate an error in the Do loop.

You would provide additional conditions, which you have not even hinted at, for whether to use the "Do i=3" or "Do i=4" version. Probably some sort of test in an IF statement before the Do.

Zeus_Olympus · Posted 06-24-2022 03:02 AM

You are right. My request was not clarifying the "2nd or 3rd". My apologies for this.

What I meant was either the second or the 3rd underscore based on the condition that the previous block of characters is either U8 , FT, 7P (just before the 2nd "_")

or SA, CD, BF just before the 3rd "_".

I hope this clarifies my question.

Thank you in advance.

Patrick · Posted 06-24-2022 04:49 AM

Below an option if I understand the requirement right.

data have;
  input textstr $60.;
  datalines;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
11_222_A_99
11_222_99
11_99
11
;

data want;
  set have;
  length want_str $60;
  if countc(textstr,'_')>2 then
    want_str=prxchange('s/^[^_]+_[^_]+_(.+)_[^_]+$/$1/oi',1,strip(textstr));
run;

Ksharp · Posted 06-24-2022 08:09 AM

data have;
   input str $50.;
datalines;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
;

data want;
 set have;
want=prxchange('s/^[^_]+_[^_]+_|_[^_]+$//',-1,strip(str));
run;

Remove all text before 2nd (or 3rd) ('_') & all text after the 1st ('_') backwards from the end

Re: Remove all text before 2nd (or 3rd) ('_') & all text after the 1st ('_') backwards from the

Re: Remove all text before 2nd (or 3rd) ('_') & all text after the 1st ('_') backwards from the

Re: Remove all text before 2nd (or 3rd) ('_') & all text after the 1st ('_') backwards from the

Re: Remove all text before 2nd (or 3rd) ('_') & all text after the 1st ('_') backwards from the

Re: Remove all text before 2nd (or 3rd) ('_') & all text after the 1st ('_') backwards from the

Re: Remove all text before 2nd (or 3rd) ('_') & all text after the 1st ('_') backwards from the

Re: Remove all text before 2nd (or 3rd) ('_') & all text after the 1st ('_') backwards from the

Registration is open

SAS Training: Just a Click Away