BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Zeus_Olympus
Fluorite | Level 6

Dear all,

 

I have text string values like:

 

TR_U8_FORTE_BDERT_GT_RPJ

GD_FT_LAMAN_TRITE_SRTI_GT_RPJ

VB_7P_VELON_SUTI_RPJ

 

and I need :

FORTE_BDERT_GT

LAMAN_TRITE_SRTI_GT

VELON_SUTI

 

I have tried SCAN function without success.

 

I would appreciate your advice.

 

Thank you.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
   input str $50.;
datalines;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
;

data want;
 set have;
want=prxchange('s/^[^_]+_[^_]+_|_[^_]+$//',-1,strip(str));
run;

View solution in original post

6 REPLIES 6
CarmineVerrell
SAS Employee

Here is a quick way to get the results you are looking for.

data test ;
input textstr $31.;
newtextstr=substr(textstr,1,length(textstr)-4);
newtextstr=substr(newtextstr,7);
datalines ;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
;
run;

ballardw
Super User

You will have to describe in terms of things that exist in the data when you want "before 2nd" and "before 3rd" removed.

 

This works for your example:

data have;
   input str $50.;
datalines;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
;

data want;
   set have;
   length newstr $ 50.;
   do i= 3 to (countw(str,'_') - 1);
      newstr = catx('_',newstr,scan(str,i,'_'));
   end;
drop i; run;

This works for "remove 2" by starting at the 3rd "word" delimited by the _ character. You would replace the 3 with 4 in the Do i=3 to remove 3 (start at 4th word). The Countw function tells us how many "words" separated by the _ there are so we stop getting before the last.

Catx inserts the _ between the words pulled using SCAN, which only treats the _ as delimiter.

 

Caution: if there are ever only 3 (or 4) "words" you will get an empty string result. If you have fewer than 3 you will generate an error in the Do loop.

 

You would provide additional conditions, which you have not even hinted at, for whether to use the "Do i=3" or "Do i=4" version. Probably some sort of test in an IF statement before the Do.

Zeus_Olympus
Fluorite | Level 6

You are right. My request was not clarifying the "2nd or 3rd". My apologies for this.

 

What I meant was either the second or the 3rd underscore based on the condition that the previous block of characters is either  U8 , FT, 7P  (just before the 2nd "_")

or  SA, CD, BF just before the 3rd "_".

 

I hope this clarifies my question.

 

Thank you in advance.

Patrick
Opal | Level 21

Below an option if I understand the requirement right.

data have;
  input textstr $60.;
  datalines;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
11_222_A_99
11_222_99
11_99
11
;

data want;
  set have;
  length want_str $60;
  if countc(textstr,'_')>2 then
    want_str=prxchange('s/^[^_]+_[^_]+_(.+)_[^_]+$/$1/oi',1,strip(textstr));
run;

Patrick_0-1656060577703.png

 

Ksharp
Super User
data have;
   input str $50.;
datalines;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
;

data want;
 set have;
want=prxchange('s/^[^_]+_[^_]+_|_[^_]+$//',-1,strip(str));
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 638 views
  • 4 likes
  • 6 in conversation