Hi,
I’m trying to extract a price from a string and I’m having some with my code.
This is what I start with:
PROMOTION_TITLE |
Aug Earn $25 v2 external |
Aug Earns $125 v2 external |
Aug Earns $50 v2 external |
This is what I need:
Price |
$25 |
$125 |
$50 |
This is my program so far. I can't seem to get the syntax right.
data test (keep=PROMOTION_TITLE price) ;
set a.promo;
price =(input(prxchange('s/(\w+)_(\d+)_(\w+)/$2/',-1,PROMOTION_TITLE),8.))/100;
format price dollar8.2;
run;
Any assistance will be greatly appreciated.
Thanks!
Consider other SAS functions...
* DATA step with instream data-rows for illustration. ;
data _null_;
input ;
format price dollar8.2 ;
* parse input record and extract dollar-amount, convert to numeric. ;
price = input(scan(substr(_infile_,find(_infile_,'$','t')+1),1,' '),8.);
putlog _all_;
datalines;
Aug Earn $25 v2 external
Aug Earns $125 v2 external
Aug Earns $50 v2 external
run;
Scott Barry
SBBWorks, Inc.
This looks about right. Instead of removing everything around the dollar amount use a capture buffer to retrieve it.
This is good but is it possible to change the var to numeric?
Yes.
data want;
input ;
price =prxchange('s/.*(\$\d+).*/\1/',-1,_infile_) ;
datalines;
Aug Earn $25 v2 external
Aug Earns $125 v2 external
Aug Earns $50 v2 external
run;
My proposition is similar to DN's :
data have;
input pt &:$32.;
datalines;
Aug Earn $25 v2 external
Aug Earns $125 v2 external
Aug Earns $50 v2 external
;
data want;
if not prxId then prxId + prxparse("/\$\s?\d+(\.\d{2})?/");
set have;
call prxsubstr(prxId, pt, pos, len);
if pos > 0 then price = input (substr(pt, pos, len), dollar10.);
format price dollar11.2;
drop prxId pos len;
run;
proc print data=want noobs; run;
PG
There is a feature in proc format that enableS the definition of the prx in a INVALUE range
The feature has been there for a few years. Rick Langston has been demonstrating it at SAS GF (along with the concept of using a function in "label" area.)
Once compiled the syntax to pull that "price" could just be the usual INPUT statement with the very unusual user INFORMAT.
Thanks for this pointer to informats and RegEx: https://support.sas.com/resources/papers/proceedings12/245-2012.pdf
Patrick
you're welcome. And thanks for reporting the link.
Of course, it doesn't make prx any simpler just their implementation
regards
Peter
Thanks Patrick and Peter, very good to know. I wish ALL the features of my licenced SAS software were described in the main doc.I find proc format particularly poor in that respect.
PG
The question is now if these are unsupported features or just something missing in the documentation. I've asked SAS TechSupport this question and will post the answer here.
And here how this could work using the REGEXPE option:
proc format;
invalue Price (default=20)
's/.*\$(\d+\.?\d*).*/\1/' (REGEXPE) = [best32.]
other=0
;
run;
data have;
input pt &:Price.;
datalines;
Aug Earn $25 v2 external
Aug Earns $125.45 v2 external
Aug Earns $50 v2 external
Aug Earns v2 external
;
run;
Thank you Patrick! According to Rick Langston's paper, the feature was introduced with SAS 9.3. Makes me wonder how many goodies I am missing. Maybe the slow disclosure rate is just another step in the testing process.
PG
data have;
input pt &:$32.;
datalines;
Aug Earn $25 v2 external
Aug Earns $125 v2 external
Aug Earns $50 v2 external
Aug Earns $125.01 v2 external
Aug Earns $50x v2 external
;
You can parse using double scan (it will be faster than prx)
data want;
set have;
price=input(scan(scan(pt,2,'$'),1,,'as'),best.);
format price dollar8.2;
Alternatively, using prx function
data want2;
re=prxparse('#(?<=\$)(\d+(\.\d{2})?)#');
format price dollar8.2;
do until(done);
set have end=done;
if prxmatch(re,pt) then do;
price=input(prxposn(re,1,pt),best.);
output;
end;
end;
run;
Using a regexpe informat
proc format;
invalue ReDollar 's#^.*(?<=\$)(\d+(\.\d{2})?).*$#\1#' (regexpe) = [14.];
run;
data want3;
set have;
price=input(pt,ReDollar.);
format price dollar8.2;
run;
^.*(?<=\$)(\d+(\.\d{2})?).*$
Options: ^ and $ match at line breaks
Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=\$)»
Match the character “$” literally «\$»
Match the regular expression below and capture its match into backreference number 1 «(\d+(\.\d{2})?)»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below and capture its match into backreference number 2 «(\.\d{2})?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character “.” literally «\.»
Match a single digit 0..9 «\d{2}»
Exactly 2 times «{2}»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the end of a line (at the end of the string or before a line break character) «$»
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.