I created a macro variable numlist as the following:
ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVEN|TWELVE|THIRTEEN|FOURTEEN|FIFTEEN|SIXTEEN|SEVENTEEN|EIGHTEEN|NINETEEN|TWENTY|TWENTY-ONE|TWENTY-TWO|TWENTY-THREE|and so on.
And format of $wordnum converts one to 1, two to 2, and so on.
Now I need to use regular expression to capture the word numbers like this:
data have;
var="abcd efgh ijkl WITH THREE mnop qrst"; output;
var="abcd efgh ijkl W/ TWENTY-FIVE qrst"; output;
run;
%macro try;
data want;
set have;
%local re1;
%let re1=%sysfunc(prxparse(/(\s|\(|\)|-|#|[HW]\/ ?)(\d{1,3}|&numlist)\W*(mnop|qrst)/));
%if &re1 > 0 %then %do;
%let pt_comments=%sysfunc(prxposn(&re1,2,var));
var2 = put("&pt_comments",$wordnum.);
%end;
run;
%mend;
%try;
proc print data=want; run;
I’m expecting result of the following: var2 as 3 and 25, what did I do wrong in my program to create want? Thanks!
obs var var2
1 | abcd efgh ijkl WITH THREE mnop qrst | 3 |
2 | abcd efgh ijkl W/ TWENTY-FIVE qrst | 25 |
The following is from the documentation's explanation of the ?? modifier: ? or ?? specifies the optional question mark (?) and double question mark (??) modifiers that suppress the printing of both the error messages and the input lines when invalid data values are read. The ? modifier suppresses the invalid data message. The ?? modifier also suppresses the invalid data message and, in addition, prevents the automatic variable _ERROR_ from being set to 1 when invalid data are read.
While I've tried to find the time to learn regular expressions, I still haven't. Thus, I'm sure the following is more convoluted than it has to be. However, if I correctly understand what you want to do, then the following appears to accomplish the task:
data seed;
length start $100;
retain fmtname 'wordnum' type 'i';
do label=1 to 100;
start=upcase(put(label,words100.));
output;
end;
run;
proc format library=work cntlin=seed; run;
data have;
var="abcd efgh ijkl WITH THREE mnop qrst"; output;
var="abcd efgh ijkl WITH THREE mxop qrst"; output;
var="abcd efgh ijkl WITH THIRTY mnop qrst"; output;
var="abcd efgh ijkl W/ TWENTY-FIVE qrst"; output;
var="abcd efgh ijkl W TWENTY-FIVE qrst"; output;
var="abcd efgh ijkl With/ TWENTY-FIVE qrst"; output;
run;
data want (drop=newvar re);
set have;
length newvar $200;
var=upcase(var);
_n_=1;
do while (scan(var,_n_,' ') ne '');
if input(scan(var,_n_,' '),?? wordnum.) gt 0 then do;
var2=input(scan(var,_n_,' '),wordnum.);
newvar=catx(' ',newvar,input(scan(var,_n_,' '),wordnum.));
end;
else newvar=catx(' ',newvar,scan(var,_n_,' '));
_n_+1;
end;
re=prxparse('/(WITH\s|W\/\s)\d{1,3}(\sMNOP|\sQRST)/');
if not prxmatch(re,newvar) then call missing(var2);
run;
Here is the program to create the macro variable and the format:
data seed;
length start $100 num $10000;
retain fmtname 'wordnum' type 'c' num '';
do label=1 to 100;
start=upcase(put(label,words100.));
output;
num = catx('|', num, upcase(put(label,words100.)));
end;
call symputx('numlist', num);
run;
proc format library=work cntlin=seed; run;
You can also do it without either a regular expression or a macro:
data seed;
length start $100;
retain fmtname 'wordnum' type 'i';
do label=1 to 100;
start=upcase(put(label,words100.));
output;
end;
run;
proc format library=work cntlin=seed; run;
data have;
var="abcd efgh ijkl WITH THREE mnop qrst"; output;
var="abcd efgh ijkl WITH JUNK mnop qrst"; output;
var="abcd efgh ijkl W/ TWENTY-FIVE qrst"; output;
run;
data want;
set have;
_n_=1;
do while (scan(var,_n_,' ') ne '');
if input(scan(var,_n_,' '),?? wordnum.) gt 0 then
var2=input(scan(var,_n_,' '),wordnum.);
_n_+1;
end;
run;
Thank you. The thing is there are more conditions need to be considered, like the number has to have WITH OR W/ in front of it and mnop behind it, etc, I can’t list all of them.
Can you explain what does ?? do in the first input statement?
Thanks!
The following is from the documentation's explanation of the ?? modifier: ? or ?? specifies the optional question mark (?) and double question mark (??) modifiers that suppress the printing of both the error messages and the input lines when invalid data values are read. The ? modifier suppresses the invalid data message. The ?? modifier also suppresses the invalid data message and, in addition, prevents the automatic variable _ERROR_ from being set to 1 when invalid data are read.
While I've tried to find the time to learn regular expressions, I still haven't. Thus, I'm sure the following is more convoluted than it has to be. However, if I correctly understand what you want to do, then the following appears to accomplish the task:
data seed;
length start $100;
retain fmtname 'wordnum' type 'i';
do label=1 to 100;
start=upcase(put(label,words100.));
output;
end;
run;
proc format library=work cntlin=seed; run;
data have;
var="abcd efgh ijkl WITH THREE mnop qrst"; output;
var="abcd efgh ijkl WITH THREE mxop qrst"; output;
var="abcd efgh ijkl WITH THIRTY mnop qrst"; output;
var="abcd efgh ijkl W/ TWENTY-FIVE qrst"; output;
var="abcd efgh ijkl W TWENTY-FIVE qrst"; output;
var="abcd efgh ijkl With/ TWENTY-FIVE qrst"; output;
run;
data want (drop=newvar re);
set have;
length newvar $200;
var=upcase(var);
_n_=1;
do while (scan(var,_n_,' ') ne '');
if input(scan(var,_n_,' '),?? wordnum.) gt 0 then do;
var2=input(scan(var,_n_,' '),wordnum.);
newvar=catx(' ',newvar,input(scan(var,_n_,' '),wordnum.));
end;
else newvar=catx(' ',newvar,scan(var,_n_,' '));
_n_+1;
end;
re=prxparse('/(WITH\s|W\/\s)\d{1,3}(\sMNOP|\sQRST)/');
if not prxmatch(re,newvar) then call missing(var2);
run;
Thank you so much! Arthur.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.