BookmarkSubscribeRSS Feed
GeorgeSAS
Lapis Lazuli | Level 10

Hello all,

 

I want to use PRXPARSE function to get sub strings like these:

 

 

'10-30 oz' or '25 lb'   or '2.5  -  55 lbs' ,..... ect

 

I use this

 

PATTERN =PRXPARSE("/(\d+\.*\d+ *-* *\d+\.*\d+ *)|(\d+\.?\d+-\d+\.?\d+ (lb|oz))|(\d+\.?\d+\.?\d+ (lb|oz))|(\d+\.?\d+? (lb|oz))|(\d+? *(lb|oz))|(\d *- *\d *(lb|oz))|(\d+?.\d+? *(lb|oz))|(\d+.\d+ *- *\d *(lb|oz))|(\d *- *\d+.\d+ *(lb|oz))/");

 

but this is not good.
  

please help,

 

thanks

6 REPLIES 6
Reeza
Super User

We need to know what your full text looks like. Also, what issues are occurring in with your current code. 

PGStats
Opal | Level 21

Can you provide a few test strings that cover the range of your input text: things that should match and strings that shouldn't?

PG
PGStats
Opal | Level 21

Maybe this is good :

 

"/\d+(\.\d*)?\s{0,2}(-\s{0,2}\d+(\.\d*)?)?\s?(lbs|lb|oz)/i"

(not tested much)

PG
Ksharp
Super User
data have;
length x $ 100;
x='10-30 oz';output;
x='25 lb';output;  
x='2.5  -  55 lbs';output;
run;
data _null_;
 set have;
 if prxmatch('/([\s\d\.]+\-)?[\s\d\.]+(oz|lb|lbs)/i',x) then putlog 'Matched';
  else putlog 'Not Matched';
run;
PGStats
Opal | Level 21

Xia, please note that you must name the longer alternatives first, (oz|lbs|lb) for example, because the parser stops at the first match, so that (oz|lb|lbs) will never match the s in lbs.

PG
PGStats
Opal | Level 21

A comparison

 

data test;
length str $64;
do str = 
    "The Wizard of Oz says",
    "There are 10-30 oz of potatoes",
    "or 25 lb of onions",
    "and 2.5  -  55 lbs of lard in this delicious recipe.";
    output;
    end;
run;

data amtPG;
length subStr $20;
if not prx1 then prx1 + 
    prxparse("/\d+(\.\d*)?\s{0,2}(-\s{0,2}\d+(\.\d*)?)?\s?(lbs|lb|oz)/i");
set test;
if prxmatch(prx1, str) then do;
    subStr = prxposn(prx1, 0, str);
    output;
    end;
drop prx1;
run;

title "PG's pattern";
proc print data=amtPG noobs; run;

data amtXK;
length subStr $20;
if not prx1 then prx1 + 
    prxparse('/([\s\d\.]+\-)?[\s\d\.]+(oz|lb|lbs)/i');
set test;
if prxmatch(prx1, str) then do;
    subStr = prxposn(prx1, 0, str);
    output;
    end;
drop prx1;
run;

title "XK's pattern";
proc print data=amtXk noobs; run;
                                    PG's pattern    

       subStr            str

       10-30 oz          There are 10-30 oz of potatoes
       25 lb             or 25 lb of onions
       2.5  -  55 lbs    and 2.5  -  55 lbs of lard in this delicious recipe.

                                    XK's pattern    

       subStr           str

       Oz               The Wizard of Oz says
       10-30 oz         There are 10-30 oz of potatoes
       25 lb            or 25 lb of onions
       2.5  -  55 lb    and 2.5  -  55 lbs of lard in this delicious recipe.
PG

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1652 views
  • 1 like
  • 4 in conversation