Hi, I wanted to extract pact of the characters from a string which has delimiter '_'. The string is like this:
'abcd_ggg_fff_1234'
My question is: is there a single step to get 'ggg_fff" from that string, or have to do it in two steps?
Thanks!
Or
data have;
input str $30.;
cards;
abcd_ggg_fff_1234
abcd_ttt_www_1234
qadc_hhh_lll_4321
dret_eee_1278
;
data want;
set have;
call scan(str, 1, p, l,'_');
call scan(str, -1, position, length,'_');
want=substr(str,l+2,position-(l+3));
drop p: l:;
run;
please try perl regular expression
data have;
x='abcd_ggg_fff_1234';
y=prxchange('s/(\w+)(ggg_fff)(.\d+)/$2/',-1,x);
run;
Thanks, Jagadishkatam! But I have more strings with different middle parts:
'abcd_ggg_fff_1234'
'abcd_ttt_www_1234'
'qadc_hhh_lll_4321'
'dret_eee_1278'
I just want to do the general removal of prefix and suffix for all strings, not for a specific one.
Thank you!
HI @leehsin
Call scan
data have;
input str $30.;
cards;
abcd_ggg_fff_1234
abcd_ttt_www_1234
qadc_hhh_lll_4321
dret_eee_1278
;
data want;
set have;
call scan(str, 1, position, length,'_');
substr(str,1,length+1)=' ';
call scan(str, -1, position, length,'_');
substr(str,position-1)=' ';
drop position length;
run;
Hi, novinosrin, thanks for the solution. So, one step is not feasible. How about I want to keep both the original string and the new string?
Hi @leehsin My second version gives you both. Please review
Or
data have;
input str $30.;
cards;
abcd_ggg_fff_1234
abcd_ttt_www_1234
qadc_hhh_lll_4321
dret_eee_1278
;
data want;
set have;
call scan(str, 1, p, l,'_');
call scan(str, -1, position, length,'_');
want=substr(str,l+2,position-(l+3));
drop p: l:;
run;
Great! This solved my problem! Thank you so much!
Hello @leehsin Just to accept your challenge, here is a one step solution. It's not any faster or better but I loved your question.
data have;
input str $30.;
cards;
abcd_ggg_fff_1234
abcd_ttt_www_1234
qadc_hhh_lll_4321
dret_eee_1278
;
data want;
set have;
want=substr(str,index(str,'_')+1, findc(str,'_','B')- (index(str,'_')+1));
run;
Also, Call scan is by far much faster than a regular expression for a simple problem like this unless your pattern really needs a regex
@leehsin This linear approach is pretty quick
data have;
input str $30.;
cards;
abcd_ggg_fff_1234
abcd_ttt_www_1234
qadc_hhh_lll_4321
dret_eee_1278
;
data want;
set have;
length want $30;
do _n_=2 to countw(str,'_')-1;
want=catx('_',want,scan(str,_n_,'_'));
end;
run;
Hi, novinosrin,
Thank you for providing more solutions! These versatile ways will serve many kinds of scenarios. Thank you!
data have;
input text :$100.;
if countw(text,'_')=4 then y=prxchange('s/(\w+\_)(\w+\_\w+)(\_\d+)/$2/',-1,text);
else if countw(text,'_')=3 then y=prxchange('s/(\w+\_)(\w+)(\_\d+)/$2/',-1,text);
cards;
abcd_ggg_fff_1234
abcd_ttt_www_1234
qadc_hhh_lll_4321
dret_eee_1278
;
Jagadishkatam, yours is another good solution. Maybe more codes needed if I want to automatically detect the numbers for countw to use.
Thank you!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.