Hi @sqlGoddess
Here are some different ways. My primitive prxchange extracts MillionText from Notes if a string containing corrency + number + 'million' is present, otherwise it returns the full Notes-text, so they are all based on a comparison between the extractes text and the full note. I think it could be done more elegant.
* Test data - row 3 doesn't contain million;
data testcount;
infile datalines delimiter='|';
input Notes: $526. ;
datalines;
In 2009, Cirque du Soleil was given a $34.4 million, five-year contract by the city of Quebec to create free summer productions for tourists. The show, titled Les Chemins invisibles, premiered a different installment each year for five years between 2009 and 201"
Cirque du Soleil created a $3.3 million production involving 150 artists to commemorate the 400th anniversary of Quebec City in October 2008, running for a total of 5 performances.[176][177]"
Cirque du Soleil created an expensive production involving 150 artists to commemorate the 400th anniversary of Quebec City in October 2008, running for a total of 5 performances.[176][177]"
Allavita! was an hour-long show featuring 48 artists created for Expo 2015 in Milan. The €8 million production was performed in an open-air theatre built by Expo 2015.[185]
;
run;
* Extract notes containing 'million', generate Milliontext;
data want1;
length MillionText $45;
set testcount;
Notes=tranwrd(Notes,'C2A0'x,'20'x);
MillionText = prxchange('s/(.*)([$|€]\s*\d*\.*\d*\s+million)(.*)/$2/i',-1,Notes);
if MillionText not =: Notes then output;
run;
* Extract notes containing 'million', generate Milliontext;
proc sql;
create table want2 as
select
Notes,
prxchange('s/(.*)([$|€]\s*\d*\.*\d*\s+million)(.*)/$2/i',-1,Notes) as MillionText
from testcount
where calculated MillionText ne Notes;
quit;
* Extract notes containing 'million';
proc sql;
create table want3 as
select Notes
from TestCount
where prxchange('s/(.*)([$|€]\s*\d*\.*\d*\s+million)(.*)/$2/i',-1,Notes) ne Notes;
quit;
* Extract notes containing 'million';
data want4;
set testcount(where=(prxchange('s/(.*)([$|€]\s*\d*\.*\d*\s+million)(.*)/$2/i',-1,Notes) not =: Notes));
run;
* Print notes containing 'million';
proc print data=testcount(where=(prxchange('s/(.*)([$|€]\s*\d*\.*\d*\s+million)(.*)/$2/i',-1,Notes) not =: Notes));
run;
... View more