Hello, I am learning SAS and trying to write a macro program that takes a file path as a macro parameter and a second parameter for the search term to be found in this file. However, I can't get it to work because I am getting the Error
ERROR: The regular expression passed to the function PRXMATCH contains a syntax error.
This is the code
%macro matcher_prxmatch(path, search);
data prxmatch;
length pos count zeilen_nr 8 line $32767 regex pattern $100 lrecl 8 search $100;
infile "&path" lrecl=32767 truncover;
input line $varying32767. lrecl;
search="&search";
pattern = cats('/', search, '/i');
regex = prxparse(pattern);
if missing(regex) then do;
put "ERROR: Ungültiger regulärer Ausdruck.";
stop;
end;
zeilen_nr = _N_;
count = 0;
pos = prxmatch(regex, line);
if pos > 0 then do;
count + 1;
temp_line = substr(line, pos + 1);
do while (prxmatch(regex, temp_line) > 0);
count + 1;
temp_line = substr(temp_line, prxmatch(regex, temp_line) + 1);
end;
end;
output;
run;
proc print data=prxmatch;
var zeilen_nr count;
run;
%mend matcher_prxmatch;
%matcher_prxmatch(_____a.txt, SYS);
Can someone explain to me why the error occurs?
@Dimax wrote:
This is really complicated... I had
length regex 8
, which I thought meant numeric 8 bytes. I changed it tolength regex 8.
and the error disappeared. Then I changed it back tolength regex 8
, and I still didn't get an error. I also thank everyone for their help. 👍
The period makes no difference. Lengths are always integers.
You had this:
length ... regex pattern $100 ..;
Which defines REGEX and PATTERN as length $100.
Run the same code without the macro and you will be able to clearly see the LINE that is causing the error.
1 data prxmatch;
2 length pos count zeilen_nr 8 line $32767 regex pattern $100 lrecl 8
2 ! search $100;
3 infile text lrecl=32767 truncover;
4 input line $varying32767. lrecl;
5 search="SYS";
6 pattern = cats('/', search, '/i');
7 regex = prxparse(pattern);
8 if missing(regex) then do;
9 put "ERROR: Ungültiger regulärer Ausdruck.";
10 stop;
11 end;
12 zeilen_nr = _N_;
13 count = 0;
14 pos = prxmatch(regex, line);
15 if pos > 0 then do;
16 count + 1;
17 temp_line = substr(line, pos + 1);
NOTE: Variable "temp_line" was given a default length of 32767 as the result
of a function call. If you do not like this, please use a LENGTH
statement to declare "temp_line".
18 do while (prxmatch(regex, temp_line) > 0);
19 count + 1;
20 temp_line = substr(temp_line, prxmatch(regex,temp_line) + 1);
21 end;
22 end;
23 output;
24 run;
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
7:11
NOTE: Variable lrecl is uninitialized.
So you are defining REGEX as a CHARACTER variable, but it needs to be a NUMERIC variable.
You are also trying to use LRECL in the INPUT statement to specify the length for the $VARYING informat when you never set LRECL to any value.
data prxmatch;
length pos count zeilen_nr 8 line temp_line $32767 regex 8 pattern $100 search $100;
infile text lrecl=32767 truncover;
input ;
line = _infile_;
search="SYS";
pattern = cats('/', search, '/i');
regex = prxparse(pattern);
if missing(regex) then do;
put "ERROR: Ungültiger regulärer Ausdruck.";
stop;
end;
zeilen_nr = _N_;
count = 0;
pos = prxmatch(regex, line);
if pos > 0 then do;
count + 1;
temp_line = substr(line, pos + 1);
do while (prxmatch(regex, temp_line) > 0);
count + 1;
temp_line = substr(temp_line, prxmatch(regex,temp_line) + 1);
end;
end;
output;
run;
@Dimax wrote:
regex 8 is defined as numeric
because prxparse(pattern); return pattern id
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
7:11 this is because file path contains numbers
Show us the ENTIRE log for this macro. Showing us tiny little bits of the log really isn't helpful. Please click on the </> icon and paste the log into the window that appears.
@Dimax wrote:
regex 8 is defined as numeric
because prxparse(pattern); return pattern id
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
7:11 this is because file path contains numbers
Actual the reverse. REGEX is defined as CHARACTER because of the LENGTH statement. PRXPARSE() function returns a numeric value which is then converted into a character string to store into REGEX. So REGEX ends up with a value like ' 1'.
This then causes the later errors when REGEX is used as the input to the PRXMATCH() function call. Since REGEX is character instead of numeric PRXMATCH() is attempting to interpret the value as a new regular expression instead of using the one previously compiled by the PRXPARSE() function call.
This is really complicated... I had length regex 8
, which I thought meant numeric 8 bytes. I changed it to length regex 8.
and the error disappeared. Then I changed it back to length regex 8
, and I still didn't get an error. I also thank everyone for their help. 👍
@Dimax wrote:
This is really complicated... I had
length regex 8
, which I thought meant numeric 8 bytes. I changed it tolength regex 8.
and the error disappeared. Then I changed it back tolength regex 8
, and I still didn't get an error. I also thank everyone for their help. 👍
The period makes no difference. Lengths are always integers.
You had this:
length ... regex pattern $100 ..;
Which defines REGEX and PATTERN as length $100.
Get the SAS code to work before trying to convert it into a macro.
%let search=12;
data prxmatch;
length zeilen_nr pos count 8 search $100 line temp_line $32767 regex 8 ;
infile text lrecl=32767 truncover;
input line $char32767.;
zeilen_nr + 1;
if zeilen_nr=1 then do;
search="&search";
regex=prxparse("/&search/i");
if missing(regex) then do;
put "ERROR: Ungültiger regulärer Ausdruck.";
stop;
end;
end;
retain search regex;
count = 0;
temp_line = line;
do until(pos=0);
pos = prxmatch(regex, temp_line);
if pos > 0 then do;
count + 1;
temp_line = substr(temp_line, pos + 1);
end;
end;
drop regex pos temp_line;
run;
proc print;
run;
Result
zeilen_ Obs nr count search line 1 1 1 12 Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 2 2 0 12 Name=Alice Sex=F Age=13 Height=56.5 Weight=84 3 3 0 12 Name=Barbara Sex=F Age=13 Height=65.3 Weight=98 4 4 0 12 Name=Carol Sex=F Age=14 Height=62.8 Weight=102.5 5 5 0 12 Name=Henry Sex=M Age=14 Height=63.5 Weight=102.5 6 6 1 12 Name=James Sex=M Age=12 Height=57.3 Weight=83 7 7 1 12 Name=Jane Sex=F Age=12 Height=59.8 Weight=84.5 8 8 1 12 Name=Janet Sex=F Age=15 Height=62.5 Weight=112.5 9 9 0 12 Name=Jeffrey Sex=M Age=13 Height=62.5 Weight=84 10 10 1 12 Name=John Sex=M Age=12 Height=59 Weight=99.5 11 11 0 12 Name=Joyce Sex=F Age=11 Height=51.3 Weight=50.5 12 12 0 12 Name=Judy Sex=F Age=14 Height=64.3 Weight=90 13 13 1 12 Name=Louise Sex=F Age=12 Height=56.3 Weight=77 14 14 1 12 Name=Mary Sex=F Age=15 Height=66.5 Weight=112 15 15 0 12 Name=Philip Sex=M Age=16 Height=72 Weight=150 16 16 2 12 Name=Robert Sex=M Age=12 Height=64.8 Weight=128 17 17 0 12 Name=Ronald Sex=M Age=15 Height=67 Weight=133 18 18 0 12 Name=Thomas Sex=M Age=11 Height=57.5 Weight=85 19 19 1 12 Name=William Sex=M Age=15 Height=66.5 Weight=112
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.