Help using Base SAS procedures

how to PRXPARSE this substring

Reply
Regular Contributor
Posts: 241

how to PRXPARSE this substring

[ Edited ]

Hello all,

 

I want to use PRXPARSE function to get sub strings like these:

 

 

'10-30 oz' or '25 lb'   or '2.5  -  55 lbs' ,..... ect

 

I use this

 

PATTERN =PRXPARSE("/(\d+\.*\d+ *-* *\d+\.*\d+ *)|(\d+\.?\d+-\d+\.?\d+ (lb|oz))|(\d+\.?\d+\.?\d+ (lb|oz))|(\d+\.?\d+? (lb|oz))|(\d+? *(lb|oz))|(\d *- *\d *(lb|oz))|(\d+?.\d+? *(lb|oz))|(\d+.\d+ *- *\d *(lb|oz))|(\d *- *\d+.\d+ *(lb|oz))/");

 

but this is not good.
  

please help,

 

thanks

Super User
Posts: 19,835

Re: how to PRXPARSE this substring

Posted in reply to GeorgeSAS

We need to know what your full text looks like. Also, what issues are occurring in with your current code. 

Respected Advisor
Posts: 4,927

Re: how to PRXPARSE this substring

Posted in reply to GeorgeSAS

Can you provide a few test strings that cover the range of your input text: things that should match and strings that shouldn't?

PG
Respected Advisor
Posts: 4,927

Re: how to PRXPARSE this substring

Posted in reply to GeorgeSAS

Maybe this is good :

 

"/\d+(\.\d*)?\s{0,2}(-\s{0,2}\d+(\.\d*)?)?\s?(lbs|lb|oz)/i"

(not tested much)

PG
Super User
Posts: 10,041

Re: how to PRXPARSE this substring

Posted in reply to GeorgeSAS
data have;
length x $ 100;
x='10-30 oz';output;
x='25 lb';output;  
x='2.5  -  55 lbs';output;
run;
data _null_;
 set have;
 if prxmatch('/([\s\d\.]+\-)?[\s\d\.]+(oz|lb|lbs)/i',x) then putlog 'Matched';
  else putlog 'Not Matched';
run;
Respected Advisor
Posts: 4,927

Re: how to PRXPARSE this substring

Xia, please note that you must name the longer alternatives first, (oz|lbs|lb) for example, because the parser stops at the first match, so that (oz|lb|lbs) will never match the s in lbs.

PG
Respected Advisor
Posts: 4,927

Re: how to PRXPARSE this substring

Posted in reply to GeorgeSAS

A comparison

 

data test;
length str $64;
do str = 
    "The Wizard of Oz says",
    "There are 10-30 oz of potatoes",
    "or 25 lb of onions",
    "and 2.5  -  55 lbs of lard in this delicious recipe.";
    output;
    end;
run;

data amtPG;
length subStr $20;
if not prx1 then prx1 + 
    prxparse("/\d+(\.\d*)?\s{0,2}(-\s{0,2}\d+(\.\d*)?)?\s?(lbs|lb|oz)/i");
set test;
if prxmatch(prx1, str) then do;
    subStr = prxposn(prx1, 0, str);
    output;
    end;
drop prx1;
run;

title "PG's pattern";
proc print data=amtPG noobs; run;

data amtXK;
length subStr $20;
if not prx1 then prx1 + 
    prxparse('/([\s\d\.]+\-)?[\s\d\.]+(oz|lb|lbs)/i');
set test;
if prxmatch(prx1, str) then do;
    subStr = prxposn(prx1, 0, str);
    output;
    end;
drop prx1;
run;

title "XK's pattern";
proc print data=amtXk noobs; run;
                                    PG's pattern    

       subStr            str

       10-30 oz          There are 10-30 oz of potatoes
       25 lb             or 25 lb of onions
       2.5  -  55 lbs    and 2.5  -  55 lbs of lard in this delicious recipe.

                                    XK's pattern    

       subStr           str

       Oz               The Wizard of Oz says
       10-30 oz         There are 10-30 oz of potatoes
       25 lb            or 25 lb of onions
       2.5  -  55 lb    and 2.5  -  55 lbs of lard in this delicious recipe.
PG
Ask a Question
Discussion stats
  • 6 replies
  • 428 views
  • 1 like
  • 4 in conversation