DATA Step, Macro, Functions and more

Complex Comma Parsing

Reply
Contributor
Posts: 30

Complex Comma Parsing

Hi All,

 

I have a somewhat large text field that I need to parse out by comma, but only when the comma is followed by a space AND an uppercase letter.  So in the following example, the single variable containing the following text:

 

"Laws, regulations and policies, Parental monitoring and supervision, Positive contributions to peer group"

 

Would result in 3 separate variables as:

Laws, regulations and policies

Parental monitoring and supervision

Positive contributions to peer group

 

I've parsed fields by commas before, I suspect it has to involve ANYUPPER, but i'm unsure how to go about it. Here's how i've approached the simpler version:

 

**identify the maximum number of values in the target minority group field and populate max_elem5 field;

proc sql noprint;

select max(count(textvar,','))+1 into :max_elem

      from have;

 

data want;

      set have;

      **create character and string substance vars;

      array tsub_vars $ 50 targsub1-targsub%eval(&max_elem);

      do i = 1 to &max_elem;

           tsub_vars{i} = strip(scan(textvar,i,','));

      end;

      run;

 

Any guidance is greatly appreciated!

 

J

PROC Star
Posts: 2,330

Re: Complex Comma Parsing

Try this:

data T;
  STR="Laws, regulations and policies, Parental monitoring and supervision, Positive contributions to peer group";
  POS=prxmatch('/, [A-Z]/',STR);
run;

This looks for:  comma then space then an uppercase letter.
 

Contributor
Posts: 30

Re: Complex Comma Parsing

Awesome ChrisNZ....thanks much!

Super User
Posts: 10,766

Re: Complex Comma Parsing

Base on @ChrisNZ 's idea.

 

data x;
x="Laws, regulations and policies, Parental monitoring and supervision, Positive contributions to peer group";
do i=1 to 99;
 p=prxmatch('/,\s+[A-Z]/',x);
 temp=substr(x,1,p+1);
 if p=0 then temp=x;
 output;
 x=substr(x,p+1);
 if p=0 then leave;
end;
run;
proc print noobs;run;
Super User
Posts: 10,766

Re: Complex Comma Parsing

data x;
x="Laws, regulations and policies, Parental monitoring and supervision, Positive contributions to peer group";
do i=1 to 99;
 p=prxmatch('/,\s+[A-Z]/',x);
 temp=substr(x,1,p+1);
 if p=0 then temp=x;
 output;
 x=substr(x,p+1);
 if p=0 then leave;
end;
run;
proc print noobs;run;
Contributor
Posts: 30

Re: Complex Comma Parsing

Hi Kevin,

 

This is even better...i was working through this when you posted.  Thanks both!

 

Jason

Ask a Question
Discussion stats
  • 5 replies
  • 106 views
  • 5 likes
  • 3 in conversation