BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
piao
Calcite | Level 5

I would like to extract some information from a list of strings using regular expression but I'm not familiar with it. Can anyone help? Thanks in advance!

 

data test;

length have $200;

have='SEQ:2;NAME:xxx;START:2018-12-03;END:uk-uk-uk';output;

have='SEQ:12;NAME:xxxxx;START:2018-12-03;END:';output;

have='SEQ:22;NAME:xxxxxxx;START:uk-uk-uk;END:2012-uk-uk';output;

have='SEQ:2;NAME:xxx;START:2018-12-03;END:uk-uk-uk SEQ:4;NAME:xxxx;START:2021-12-03;END:uk-uk-uk';output;

run;

 

The strings contain SEQ, NAME, START and END. I want to get SEQ and NAME.

The results will be:

SEQ:2;NAME:xxx;

SEQ:12;NAME:xxxxx;

SEQ:22;NAME:xxxxxxx;

SEQ:2;NAME:xxx;SEQ:4;NAME:xxxx;

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Something like below should work.

data have;
  infile datalines4 truncover;
  input have $200.;
  row_num=_n_;
  datalines4;
SEQ:2;NAME:xxx;START:2018-12-03;END:uk-uk-uk
SEQ:12;NAME:xxxxx;START:2018-12-03;END:
SEQ:22;NAME:xxxxxxx;START:uk-uk-uk;END:2012-uk-uk
SEQ:2;NAME:xxx;START:2018-12-03;END:uk-uk-uk SEQ:4;NAME:xxxx;START:2021-12-03;END:uk-uk-uk
 
SEQ:2;NAME:xxx
START:2018-12-03;
;;;;

data want(drop=_:);
  set have;
  length seq name $20;
  _prxid=prxparse('/seq:(\d*)\s*;name:([^;]*)/oi');
  _start=1;
  _stop =length(trim(have));
  call prxnext(_prxid,_start,_stop,have,_pos,_len);
  do until(_pos<=0);
    seq=prxposn(_prxid,1, have);
    name=prxposn(_prxid,2, have);
    output;
    call prxnext(_prxid,_start,_stop,have,_pos,_len);
  end;
run;

proc print data=want;
run;

 If you only want to keep rows where there is a match change the do loop to

Patrick_0-1677659220131.png

 

Result running above code:

Patrick_0-1677659318128.png

 

 

View solution in original post

1 REPLY 1
Patrick
Opal | Level 21

Something like below should work.

data have;
  infile datalines4 truncover;
  input have $200.;
  row_num=_n_;
  datalines4;
SEQ:2;NAME:xxx;START:2018-12-03;END:uk-uk-uk
SEQ:12;NAME:xxxxx;START:2018-12-03;END:
SEQ:22;NAME:xxxxxxx;START:uk-uk-uk;END:2012-uk-uk
SEQ:2;NAME:xxx;START:2018-12-03;END:uk-uk-uk SEQ:4;NAME:xxxx;START:2021-12-03;END:uk-uk-uk
 
SEQ:2;NAME:xxx
START:2018-12-03;
;;;;

data want(drop=_:);
  set have;
  length seq name $20;
  _prxid=prxparse('/seq:(\d*)\s*;name:([^;]*)/oi');
  _start=1;
  _stop =length(trim(have));
  call prxnext(_prxid,_start,_stop,have,_pos,_len);
  do until(_pos<=0);
    seq=prxposn(_prxid,1, have);
    name=prxposn(_prxid,2, have);
    output;
    call prxnext(_prxid,_start,_stop,have,_pos,_len);
  end;
run;

proc print data=want;
run;

 If you only want to keep rows where there is a match change the do loop to

Patrick_0-1677659220131.png

 

Result running above code:

Patrick_0-1677659318128.png

 

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 810 views
  • 0 likes
  • 2 in conversation