DATA Step, Macro, Functions and more

Regular Expressions: Extracting the price from a string

Reply
Frequent Contributor
Posts: 144

Regular Expressions: Extracting the price from a string

Hi,

I’m trying to extract a price from a string and I’m having some with my code.

This is what I start with:

PROMOTION_TITLE
Aug Earn $25 v2 external
Aug Earns $125 v2 external
Aug Earns $50 v2 external

This is what I need:

Price
$25
$125
$50

This is my program so far. I can't seem to get the syntax right.

data test (keep=PROMOTION_TITLE price) ;

set  a.promo;

  price =(input(prxchange('s/(\w+)_(\d+)_(\w+)/$2/',-1,PROMOTION_TITLE),8.))/100;

  format price dollar8.2;

run;

Any assistance will be greatly appreciated.

Thanks! 

Super Contributor
Super Contributor
Posts: 3,174

Re: Regular Expressions: Extracting the price from a string

Consider other SAS functions...

* DATA step with instream data-rows for illustration. ;

data _null_;

input ;

format price dollar8.2 ;

* parse input record and extract dollar-amount, convert to numeric. ;

price = input(scan(substr(_infile_,find(_infile_,'$','t')+1),1,' '),8.);

putlog _all_;

datalines;

Aug Earn $25 v2 external 

Aug Earns $125 v2 external 

Aug Earns $50 v2 external 

run;

Scott Barry

SBBWorks, Inc.

Respected Advisor
Posts: 3,799

Re: Regular Expressions: Extracting the price from a string

This looks about right.  Instead of removing everything around the dollar amount use a capture buffer to retrieve it.

data test;
   input PROMOTION_TITLE $80.;
  
retain rx;
   if _n_ eq 1 then rx = prxparse('/(\$\d+\.?\d?\d?)/');

  
if prxmatch(rx,promotion_title) then price = prxposn(rx,1,promotion_title);
   cards;
Aug Earn $25 v2 external
Aug Earns $125.75x v2 external
Aug Earns $50 v2 external
;;;;
   run;
proc print;
  
run;
Frequent Contributor
Posts: 144

Re: Regular Expressions: Extracting the price from a string

Posted in reply to data_null__

This is good but is it possible to change the var to numeric?

Respected Advisor
Posts: 3,799

Re: Regular Expressions: Extracting the price from a string

Yes.

Super Contributor
Posts: 275

Re: Regular Expressions: Extracting the price from a string

data want;

input  ;

price =prxchange('s/.*(\$\d+).*/\1/',-1,_infile_) ;

datalines;

Aug Earn $25 v2 external

Aug Earns $125 v2 external

Aug Earns $50 v2 external

run;

Respected Advisor
Posts: 4,919

Re: Regular Expressions: Extracting the price from a string

My proposition is similar to DN's :

data have;

input pt &:$32.;

datalines;

Aug Earn $25 v2 external

Aug Earns $125 v2 external

Aug Earns $50 v2 external

;

data want;

if not prxId then prxId + prxparse("/\$\s?\d+(\.\d{2})?/");

set have;

call prxsubstr(prxId, pt, pos, len);

if pos > 0 then price = input (substr(pt, pos, len), dollar10.);

format price dollar11.2;

drop prxId pos len;

run;

proc print data=want noobs; run;

PG

PG
Valued Guide
Posts: 2,177

Re: Regular Expressions: Extracting the price from a string

There is a feature in proc format that enableS the definition of the prx in a INVALUE range

The feature has been there for a few years. Rick Langston has been demonstrating it at SAS GF (along with the concept of using a function in  "label" area.)

Once compiled the syntax to pull that "price" could just be the usual INPUT statement with the very unusual user INFORMAT.

Respected Advisor
Posts: 4,173

Re: Regular Expressions: Extracting the price from a string

Thanks for this pointer to informats and RegEx: https://support.sas.com/resources/papers/proceedings12/245-2012.pdf

Valued Guide
Posts: 2,177

Re: Regular Expressions: Extracting the price from a string

Patrick

you're welcome. And thanks for reporting the link.

Of course, it doesn't make prx any simpler   just their implementation 

regards

Peter

Respected Advisor
Posts: 4,919

Re: Regular Expressions: Extracting the price from a string

Thanks Patrick and Peter, very good to know. I wish ALL the features of my licenced SAS software were described in the main doc.I find proc format particularly poor in that respect.

PG

PG
Respected Advisor
Posts: 4,173

Re: Regular Expressions: Extracting the price from a string

The question is now if these are unsupported features or just something missing in the documentation. I've asked SAS TechSupport this question and will post the answer here.

And here how this could work using the REGEXPE option:

proc format;

  invalue Price (default=20)

    's/.*\$(\d+\.?\d*).*/\1/' (REGEXPE) = [best32.]

    other=0

    ;

run;

data have;

  input pt &Smiley Tonguerice.;

  datalines;

Aug Earn $25 v2 external

Aug Earns $125.45 v2 external

Aug Earns $50 v2 external

Aug Earns v2 external

;

run;

Respected Advisor
Posts: 4,173

Re: Regular Expressions: Extracting the price from a string

The answer I've got from SAS TechSupport: REGEXP and REGEXPE are supported features. The SAS documentation will get updated and they will be fully documented in the future.

Respected Advisor
Posts: 4,919

Re: Regular Expressions: Extracting the price from a string

Thank you Patrick! According to Rick Langston's paper, the feature was introduced with SAS 9.3. Makes me wonder how many goodies I am missing. Maybe the slow disclosure rate is just another step in the testing process. Smiley Happy

PG

PG
Trusted Advisor
Posts: 1,301

Re: Regular Expressions: Extracting the price from a string

data have;

input pt &:$32.;

datalines;

Aug Earn $25 v2 external

Aug Earns $125 v2 external

Aug Earns $50 v2 external

Aug Earns $125.01 v2 external

Aug Earns $50x v2 external

;

You can parse using double scan (it will be faster than prx)

data want;

set have;

price=input(scan(scan(pt,2,'$'),1,,'as'),best.);

format price dollar8.2;

Alternatively, using prx function

data want2;

re=prxparse('#(?<=\$)(\d+(\.\d{2})?)#');

format price dollar8.2;

do until(done);

set have end=done;

if prxmatch(re,pt) then do;

price=input(prxposn(re,1,pt),best.);

output;

end;

end;

run;

Using a regexpe informat

proc format;

invalue ReDollar 's#^.*(?<=\$)(\d+(\.\d{2})?).*$#\1#' (regexpe) = [14.];

run;

data want3;

set have;

price=input(pt,ReDollar.);

format price dollar8.2;

run;

^.*(?<=\$)(\d+(\.\d{2})?).*$

Options: ^ and $ match at line breaks

Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»

Match any single character that is not a line break character «.*»

   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»

Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=\$)»

   Match the character “$” literally «\$»

Match the regular expression below and capture its match into backreference number 1 «(\d+(\.\d{2})?)»

   Match a single digit 0..9 «\d+»

      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

   Match the regular expression below and capture its match into backreference number 2 «(\.\d{2})?»

      Between zero and one times, as many times as possible, giving back as needed (greedy) «?»

      Match the character “.” literally «\.»

      Match a single digit 0..9 «\d{2}»

         Exactly 2 times «{2}»

Match any single character that is not a line break character «.*»

   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»

Assert position at the end of a line (at the end of the string or before a line break character) «$»

Ask a Question
Discussion stats
  • 16 replies
  • 493 views
  • 4 likes
  • 8 in conversation