DATA Step, Macro, Functions and more

Text Extraction/ regex

Reply
Contributor
Posts: 56

Text Extraction/ regex

I'm attempting to extract a substring from various areas of a string.

 

It would be on lines that have "save $x w/purchase" and would be looking for the quantity required to buy. In line 1, it would be "4". In line 8 in would be "6" The price after "save" can be double digit dollars (i.e. $25.00) and the quantity required can also be double digits. Sometimes it appears after "purchase/", somethings "purchase/any". I guess the easiest way to boil it down would be the first complete number (1, 5, 12, etc.) after the occurence of "w/".  It seems like a straight forward function/regex but I can't seem to get one to work with all the possible permutations of order and number of positions.

 

 

Sample data would be

 

data WORK.SAMPLE;
  infile datalines dsd truncover;
  input Offer:$194.;
datalines4;
save $1.00 w/purchase/any 4 or more mix & match
save $0.73 w/Just For U (Limit: 1)
save $1.00 w/purchase/any 4 participating items mix & match
save $1.00 w/purchase/any 4 or more mix & match
save $1.00 w/purchase/any 4 participating items mix & match
"save $1.00 w/purchase/any 4 participating items mix & match, save $1.00 w/purchase/any 4 participating items mix & match"
save $4.00 w/purchase/4 or more mix or match
save $3.00 w/purchase/any6 participating items mix & match
save $4.00 w/purchase/any 4 participating items mix & match
save $3.00 (Limit: 5)
save $5.00 w/purchase/5 participating items mix & match
save $5.00 w/purchase/5
save $1.00 (Limit: 1)
save $1.00 (Limit: 1)
save $4.00 w/purchase/any 4 participating items mix & match
save $4.00 w/purchase/any 4 participating items mix & match
save $3.00 w/purchase/any 6 participating items mix & match
save $0.50 w/purchase/any 4 or more participating items mix & match
save $5.00 w/purchase/5 participating items mix & match
save $5.00 w/purchase/any 5 participating items mix & match
;;;;
Super User
Posts: 10,023

Re: Text Extraction/ regex

data WORK.SAMPLE;
  infile datalines dsd truncover;
  input Offer:$194.;
datalines4;
save $1.00 w/purchase/any 4 or more mix & match
save $0.73 w/Just For U (Limit: 1)
save $1.00 w/purchase/any 4 participating items mix & match
save $1.00 w/purchase/any 4 or more mix & match
save $1.00 w/purchase/any 4 participating items mix & match
"save $1.00 w/purchase/any 4 participating items mix & match, save $1.00 w/purchase/any 4 participating items mix & match"
save $4.00 w/purchase/4 or more mix or match
save $3.00 w/purchase/any6 participating items mix & match
save $4.00 w/purchase/any 4 participating items mix & match
save $3.00 (Limit: 5)
save $5.00 w/purchase/5 participating items mix & match
save $5.00 w/purchase/5
save $1.00 (Limit: 1)
save $1.00 (Limit: 1)
save $4.00 w/purchase/any 4 participating items mix & match
save $4.00 w/purchase/any 4 participating items mix & match
save $3.00 w/purchase/any 6 participating items mix & match
save $0.50 w/purchase/any 4 or more participating items mix & match
save $5.00 w/purchase/5 participating items mix & match
save $5.00 w/purchase/any 5 participating items mix & match
;;;;

data want;
 set sample;
 pid=prxparse('/(?<=purchase\/)(any)?\s*\d+/i'); 
 call prxsubstr(pid, offer, position, length);
 if position ne 0 then 
 quantity=input(compress(substr(offer, position, length),,'kd'),best.);
 drop pid;
run;
proc print;run;
 
Ask a Question
Discussion stats
  • 1 reply
  • 110 views
  • 2 likes
  • 2 in conversation