I would like to extract some information from a list of strings using regular expression but I'm not familiar with it. Can anyone help? Thanks in advance!
data test;
length have $200;
have='SEQ:2;NAME:xxx;START:2018-12-03;END:uk-uk-uk';output;
have='SEQ:12;NAME:xxxxx;START:2018-12-03;END:';output;
have='SEQ:22;NAME:xxxxxxx;START:uk-uk-uk;END:2012-uk-uk';output;
have='SEQ:2;NAME:xxx;START:2018-12-03;END:uk-uk-uk SEQ:4;NAME:xxxx;START:2021-12-03;END:uk-uk-uk';output;
run;
The strings contain SEQ, NAME, START and END. I want to get SEQ and NAME.
The results will be:
SEQ:2;NAME:xxx;
SEQ:12;NAME:xxxxx;
SEQ:22;NAME:xxxxxxx;
SEQ:2;NAME:xxx;SEQ:4;NAME:xxxx;
Something like below should work.
data have;
infile datalines4 truncover;
input have $200.;
row_num=_n_;
datalines4;
SEQ:2;NAME:xxx;START:2018-12-03;END:uk-uk-uk
SEQ:12;NAME:xxxxx;START:2018-12-03;END:
SEQ:22;NAME:xxxxxxx;START:uk-uk-uk;END:2012-uk-uk
SEQ:2;NAME:xxx;START:2018-12-03;END:uk-uk-uk SEQ:4;NAME:xxxx;START:2021-12-03;END:uk-uk-uk
SEQ:2;NAME:xxx
START:2018-12-03;
;;;;
data want(drop=_:);
set have;
length seq name $20;
_prxid=prxparse('/seq:(\d*)\s*;name:([^;]*)/oi');
_start=1;
_stop =length(trim(have));
call prxnext(_prxid,_start,_stop,have,_pos,_len);
do until(_pos<=0);
seq=prxposn(_prxid,1, have);
name=prxposn(_prxid,2, have);
output;
call prxnext(_prxid,_start,_stop,have,_pos,_len);
end;
run;
proc print data=want;
run;
If you only want to keep rows where there is a match change the do loop to
Result running above code:
Something like below should work.
data have;
infile datalines4 truncover;
input have $200.;
row_num=_n_;
datalines4;
SEQ:2;NAME:xxx;START:2018-12-03;END:uk-uk-uk
SEQ:12;NAME:xxxxx;START:2018-12-03;END:
SEQ:22;NAME:xxxxxxx;START:uk-uk-uk;END:2012-uk-uk
SEQ:2;NAME:xxx;START:2018-12-03;END:uk-uk-uk SEQ:4;NAME:xxxx;START:2021-12-03;END:uk-uk-uk
SEQ:2;NAME:xxx
START:2018-12-03;
;;;;
data want(drop=_:);
set have;
length seq name $20;
_prxid=prxparse('/seq:(\d*)\s*;name:([^;]*)/oi');
_start=1;
_stop =length(trim(have));
call prxnext(_prxid,_start,_stop,have,_pos,_len);
do until(_pos<=0);
seq=prxposn(_prxid,1, have);
name=prxposn(_prxid,2, have);
output;
call prxnext(_prxid,_start,_stop,have,_pos,_len);
end;
run;
proc print data=want;
run;
If you only want to keep rows where there is a match change the do loop to
Result running above code:
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.