Help using Base SAS procedures

how to PRXPARSE this substring

Reply
Regular Contributor
Posts: 209

how to PRXPARSE this substring

[ Edited ]

Hello all,

 

I want to use PRXPARSE function to get sub strings like these:

 

 

'10-30 oz' or '25 lb'   or '2.5  -  55 lbs' ,..... ect

 

I use this

 

PATTERN =PRXPARSE("/(\d+\.*\d+ *-* *\d+\.*\d+ *)|(\d+\.?\d+-\d+\.?\d+ (lb|oz))|(\d+\.?\d+\.?\d+ (lb|oz))|(\d+\.?\d+? (lb|oz))|(\d+? *(lb|oz))|(\d *- *\d *(lb|oz))|(\d+?.\d+? *(lb|oz))|(\d+.\d+ *- *\d *(lb|oz))|(\d *- *\d+.\d+ *(lb|oz))/");

 

but this is not good.
  

please help,

 

thanks

Super User
Posts: 17,912

Re: how to PRXPARSE this substring

We need to know what your full text looks like. Also, what issues are occurring in with your current code. 

Respected Advisor
Posts: 4,659

Re: how to PRXPARSE this substring

Can you provide a few test strings that cover the range of your input text: things that should match and strings that shouldn't?

PG
Respected Advisor
Posts: 4,659

Re: how to PRXPARSE this substring

Maybe this is good :

 

"/\d+(\.\d*)?\s{0,2}(-\s{0,2}\d+(\.\d*)?)?\s?(lbs|lb|oz)/i"

(not tested much)

PG
Super User
Posts: 9,687

Re: how to PRXPARSE this substring

data have;
length x $ 100;
x='10-30 oz';output;
x='25 lb';output;  
x='2.5  -  55 lbs';output;
run;
data _null_;
 set have;
 if prxmatch('/([\s\d\.]+\-)?[\s\d\.]+(oz|lb|lbs)/i',x) then putlog 'Matched';
  else putlog 'Not Matched';
run;
Respected Advisor
Posts: 4,659

Re: how to PRXPARSE this substring

Xia, please note that you must name the longer alternatives first, (oz|lbs|lb) for example, because the parser stops at the first match, so that (oz|lb|lbs) will never match the s in lbs.

PG
Respected Advisor
Posts: 4,659

Re: how to PRXPARSE this substring

A comparison

 

data test;
length str $64;
do str = 
    "The Wizard of Oz says",
    "There are 10-30 oz of potatoes",
    "or 25 lb of onions",
    "and 2.5  -  55 lbs of lard in this delicious recipe.";
    output;
    end;
run;

data amtPG;
length subStr $20;
if not prx1 then prx1 + 
    prxparse("/\d+(\.\d*)?\s{0,2}(-\s{0,2}\d+(\.\d*)?)?\s?(lbs|lb|oz)/i");
set test;
if prxmatch(prx1, str) then do;
    subStr = prxposn(prx1, 0, str);
    output;
    end;
drop prx1;
run;

title "PG's pattern";
proc print data=amtPG noobs; run;

data amtXK;
length subStr $20;
if not prx1 then prx1 + 
    prxparse('/([\s\d\.]+\-)?[\s\d\.]+(oz|lb|lbs)/i');
set test;
if prxmatch(prx1, str) then do;
    subStr = prxposn(prx1, 0, str);
    output;
    end;
drop prx1;
run;

title "XK's pattern";
proc print data=amtXk noobs; run;
                                    PG's pattern    

       subStr            str

       10-30 oz          There are 10-30 oz of potatoes
       25 lb             or 25 lb of onions
       2.5  -  55 lbs    and 2.5  -  55 lbs of lard in this delicious recipe.

                                    XK's pattern    

       subStr           str

       Oz               The Wizard of Oz says
       10-30 oz         There are 10-30 oz of potatoes
       25 lb            or 25 lb of onions
       2.5  -  55 lb    and 2.5  -  55 lbs of lard in this delicious recipe.
PG
Ask a Question
Discussion stats
  • 6 replies
  • 388 views
  • 1 like
  • 4 in conversation