- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I have text string values like:
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
and I need :
FORTE_BDERT_GT
LAMAN_TRITE_SRTI_GT
VELON_SUTI
I have tried SCAN function without success.
I would appreciate your advice.
Thank you.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have; input str $50.; datalines; TR_U8_FORTE_BDERT_GT_RPJ GD_FT_LAMAN_TRITE_SRTI_GT_RPJ VB_7P_VELON_SUTI_RPJ ; data want; set have; want=prxchange('s/^[^_]+_[^_]+_|_[^_]+$//',-1,strip(str)); run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What is the logic here?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is a quick way to get the results you are looking for.
data test ;
input textstr $31.;
newtextstr=substr(textstr,1,length(textstr)-4);
newtextstr=substr(newtextstr,7);
datalines ;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You will have to describe in terms of things that exist in the data when you want "before 2nd" and "before 3rd" removed.
This works for your example:
data have; input str $50.; datalines; TR_U8_FORTE_BDERT_GT_RPJ GD_FT_LAMAN_TRITE_SRTI_GT_RPJ VB_7P_VELON_SUTI_RPJ ; data want; set have; length newstr $ 50.; do i= 3 to (countw(str,'_') - 1); newstr = catx('_',newstr,scan(str,i,'_')); end;
drop i; run;
This works for "remove 2" by starting at the 3rd "word" delimited by the _ character. You would replace the 3 with 4 in the Do i=3 to remove 3 (start at 4th word). The Countw function tells us how many "words" separated by the _ there are so we stop getting before the last.
Catx inserts the _ between the words pulled using SCAN, which only treats the _ as delimiter.
Caution: if there are ever only 3 (or 4) "words" you will get an empty string result. If you have fewer than 3 you will generate an error in the Do loop.
You would provide additional conditions, which you have not even hinted at, for whether to use the "Do i=3" or "Do i=4" version. Probably some sort of test in an IF statement before the Do.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You are right. My request was not clarifying the "2nd or 3rd". My apologies for this.
What I meant was either the second or the 3rd underscore based on the condition that the previous block of characters is either U8 , FT, 7P (just before the 2nd "_")
or SA, CD, BF just before the 3rd "_".
I hope this clarifies my question.
Thank you in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Below an option if I understand the requirement right.
data have;
input textstr $60.;
datalines;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
11_222_A_99
11_222_99
11_99
11
;
data want;
set have;
length want_str $60;
if countc(textstr,'_')>2 then
want_str=prxchange('s/^[^_]+_[^_]+_(.+)_[^_]+$/$1/oi',1,strip(textstr));
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have; input str $50.; datalines; TR_U8_FORTE_BDERT_GT_RPJ GD_FT_LAMAN_TRITE_SRTI_GT_RPJ VB_7P_VELON_SUTI_RPJ ; data want; set have; want=prxchange('s/^[^_]+_[^_]+_|_[^_]+$//',-1,strip(str)); run;