BookmarkSubscribeRSS Feed
GeorgeSAS
Lapis Lazuli | Level 10

Hello all,

 

I want to use PRXPARSE function to get sub strings like these:

 

 

'10-30 oz' or '25 lb'   or '2.5  -  55 lbs' ,..... ect

 

I use this

 

PATTERN =PRXPARSE("/(\d+\.*\d+ *-* *\d+\.*\d+ *)|(\d+\.?\d+-\d+\.?\d+ (lb|oz))|(\d+\.?\d+\.?\d+ (lb|oz))|(\d+\.?\d+? (lb|oz))|(\d+? *(lb|oz))|(\d *- *\d *(lb|oz))|(\d+?.\d+? *(lb|oz))|(\d+.\d+ *- *\d *(lb|oz))|(\d *- *\d+.\d+ *(lb|oz))/");

 

but this is not good.
  

please help,

 

thanks

6 REPLIES 6
Reeza
Super User

We need to know what your full text looks like. Also, what issues are occurring in with your current code. 

PGStats
Opal | Level 21

Can you provide a few test strings that cover the range of your input text: things that should match and strings that shouldn't?

PG
PGStats
Opal | Level 21

Maybe this is good :

 

"/\d+(\.\d*)?\s{0,2}(-\s{0,2}\d+(\.\d*)?)?\s?(lbs|lb|oz)/i"

(not tested much)

PG
Ksharp
Super User
data have;
length x $ 100;
x='10-30 oz';output;
x='25 lb';output;  
x='2.5  -  55 lbs';output;
run;
data _null_;
 set have;
 if prxmatch('/([\s\d\.]+\-)?[\s\d\.]+(oz|lb|lbs)/i',x) then putlog 'Matched';
  else putlog 'Not Matched';
run;
PGStats
Opal | Level 21

Xia, please note that you must name the longer alternatives first, (oz|lbs|lb) for example, because the parser stops at the first match, so that (oz|lb|lbs) will never match the s in lbs.

PG
PGStats
Opal | Level 21

A comparison

 

data test;
length str $64;
do str = 
    "The Wizard of Oz says",
    "There are 10-30 oz of potatoes",
    "or 25 lb of onions",
    "and 2.5  -  55 lbs of lard in this delicious recipe.";
    output;
    end;
run;

data amtPG;
length subStr $20;
if not prx1 then prx1 + 
    prxparse("/\d+(\.\d*)?\s{0,2}(-\s{0,2}\d+(\.\d*)?)?\s?(lbs|lb|oz)/i");
set test;
if prxmatch(prx1, str) then do;
    subStr = prxposn(prx1, 0, str);
    output;
    end;
drop prx1;
run;

title "PG's pattern";
proc print data=amtPG noobs; run;

data amtXK;
length subStr $20;
if not prx1 then prx1 + 
    prxparse('/([\s\d\.]+\-)?[\s\d\.]+(oz|lb|lbs)/i');
set test;
if prxmatch(prx1, str) then do;
    subStr = prxposn(prx1, 0, str);
    output;
    end;
drop prx1;
run;

title "XK's pattern";
proc print data=amtXk noobs; run;
                                    PG's pattern    

       subStr            str

       10-30 oz          There are 10-30 oz of potatoes
       25 lb             or 25 lb of onions
       2.5  -  55 lbs    and 2.5  -  55 lbs of lard in this delicious recipe.

                                    XK's pattern    

       subStr           str

       Oz               The Wizard of Oz says
       10-30 oz         There are 10-30 oz of potatoes
       25 lb            or 25 lb of onions
       2.5  -  55 lb    and 2.5  -  55 lbs of lard in this delicious recipe.
PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1079 views
  • 1 like
  • 4 in conversation