Dear all,
I have text string values like:
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
and I need :
FORTE_BDERT_GT
LAMAN_TRITE_SRTI_GT
VELON_SUTI
I have tried SCAN function without success.
I would appreciate your advice.
Thank you.
data have; input str $50.; datalines; TR_U8_FORTE_BDERT_GT_RPJ GD_FT_LAMAN_TRITE_SRTI_GT_RPJ VB_7P_VELON_SUTI_RPJ ; data want; set have; want=prxchange('s/^[^_]+_[^_]+_|_[^_]+$//',-1,strip(str)); run;
What is the logic here?
Here is a quick way to get the results you are looking for.
data test ;
input textstr $31.;
newtextstr=substr(textstr,1,length(textstr)-4);
newtextstr=substr(newtextstr,7);
datalines ;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
;
run;
You will have to describe in terms of things that exist in the data when you want "before 2nd" and "before 3rd" removed.
This works for your example:
data have; input str $50.; datalines; TR_U8_FORTE_BDERT_GT_RPJ GD_FT_LAMAN_TRITE_SRTI_GT_RPJ VB_7P_VELON_SUTI_RPJ ; data want; set have; length newstr $ 50.; do i= 3 to (countw(str,'_') - 1); newstr = catx('_',newstr,scan(str,i,'_')); end;
drop i; run;
This works for "remove 2" by starting at the 3rd "word" delimited by the _ character. You would replace the 3 with 4 in the Do i=3 to remove 3 (start at 4th word). The Countw function tells us how many "words" separated by the _ there are so we stop getting before the last.
Catx inserts the _ between the words pulled using SCAN, which only treats the _ as delimiter.
Caution: if there are ever only 3 (or 4) "words" you will get an empty string result. If you have fewer than 3 you will generate an error in the Do loop.
You would provide additional conditions, which you have not even hinted at, for whether to use the "Do i=3" or "Do i=4" version. Probably some sort of test in an IF statement before the Do.
You are right. My request was not clarifying the "2nd or 3rd". My apologies for this.
What I meant was either the second or the 3rd underscore based on the condition that the previous block of characters is either U8 , FT, 7P (just before the 2nd "_")
or SA, CD, BF just before the 3rd "_".
I hope this clarifies my question.
Thank you in advance.
Below an option if I understand the requirement right.
data have;
input textstr $60.;
datalines;
TR_U8_FORTE_BDERT_GT_RPJ
GD_FT_LAMAN_TRITE_SRTI_GT_RPJ
VB_7P_VELON_SUTI_RPJ
11_222_A_99
11_222_99
11_99
11
;
data want;
set have;
length want_str $60;
if countc(textstr,'_')>2 then
want_str=prxchange('s/^[^_]+_[^_]+_(.+)_[^_]+$/$1/oi',1,strip(textstr));
run;
data have; input str $50.; datalines; TR_U8_FORTE_BDERT_GT_RPJ GD_FT_LAMAN_TRITE_SRTI_GT_RPJ VB_7P_VELON_SUTI_RPJ ; data want; set have; want=prxchange('s/^[^_]+_[^_]+_|_[^_]+$//',-1,strip(str)); run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.