I have a macro that takes two values (both are text): pattern and serial.
I have a data set with these two variables (i.e. patterns).
How would I go about reading in, for example, the first 20 rows, then the next 20 rows, etc.
Each time a group of records is inputted the results would be stored or outputted where check equals 1 or 2.
The data that I am applying this macro to is in a separate data set (i.e. all_data).
%macro prx(pattern,serial);
b=prxparse("&pattern");
if prxmatch(b,serial_number)>0 then do;
check=1;
serial=&serial;
if (length(serial) = length(serial_number)) then check=2;
end;
%mend;
Example values and how they would fit into the macro:
pattern=066957
serial=0669578
%prx(/^066957[\d{1}]/,"0669578");
Here's an example of the records that I would like to feed into the macro (only 2 columns):
Serial Pattern
B266056 B26605
00308110 0030811
Thank you.
This is one messy post.
So you want to generate matching code from the smaller pattern table it seems.
If only you gave proper data in/ data out examples.
Like this?
data PATTERNS;
PATTERN='066957 '; SERIAL ='0669578 '; output;
PATTERN='B26605 '; SERIAL ='B26056 '; output;
PATTERN='0030811'; SERIAL ='00308110'; output;
run;
data HAVE;
SERIAL_NUMBER ='0669578 '; output;
SERIAL_NUMBER ='B266056 '; output;
SERIAL_NUMBER ='0038110'; output;
run;
data _null_;
retain CODE;
length CODE $32000;
set PATTERNS;
CODE=cats( CODE
, 'if prxmatch("/^', PATTERN, '\d{1}/", SERIAL_NUMBER)>0 then do;'
, ' CHECK=1; '
, ' if (', length(SERIAL), ' = length(SERIAL_NUMBER)) then CHECK=2; '
, 'end;'
);
call symputx('code',CODE);
run;
data WANT;
set HAVE;
&code.;
run;
SERIAL_NUMBER | CHECK |
669578 | 2 |
B266056 | 1 |
38110 | . |
1. Why read groups of 20 records? Please clarify this point.
2. Any good reason to use a RegEx instead if the index function? It's a lot slower.
3. Variable SERIAL_NUMBER is used but never defined.
1) 20 is just an example. The data set I'm searching for specific patterns has over 700,000 serial numbers. So, I want to run the macro for about 20-30 patterns at a time.
2) The total number of patterns I'm searching for is about 1,000. In addition to the example I provided, I have some other regular expressions that I'm interested in as well. Maybe I'm mistaken, but RegEx is better at finding specific patterns than the index function? What I'm currently interested in is finding out is how to pipe a list of values through a macro and capture the results.
3) I'm using the macro to find various patterns within SERIAL_NUMBER. The SERIAL_NUMBER variable is found in the data set with over 700,000 records.
This is still not very clear.
Here is a canvas. What's missing?
%macro prx(pattern,serial);
PRX=prxparse("&pattern");
if prxmatch(PRX,SERIAL_NUMBER)>0 then do;
CHECK=1;
if length("&serial") = length(serial_number) then CHECK=2;
end;
%mend;
data HAVE;
PATTERN='066957 '; SERIAL_NUMBER ='0669578 '; output;
PATTERN='B26605 '; SERIAL_NUMBER ='B266056 '; output;
PATTERN='0030811'; SERIAL_NUMBER ='00308110'; output;
run;
data WANT;
set HAVE;
%prx(/^066957\d/, 0669578);
drop PRX;
run;
asset_data has serial_number (see below).
There are over 700,000 records in asset_data.
I'm interested in the records where check equals 1 or 2.
My goal is to avoid calling %prx over and over again.
So, instead of calling %prx repeatedly how can I have %prx automatically take pattern and serial from patterns_data (see below).
data patterns_data;
PATTERN='066957 '; serial ='0669578 '; output;
PATTERN='B26605 '; serial ='B266056 '; output;
PATTERN='0030811'; serial ='00308110'; output;
run;
%macro prx(pattern,serial);
b=prxparse("&pattern");
if prxmatch(b,serial_number)>0 then do;
check=1;
serial=&serial;
if (length(serial) = length(serial_number)) then check=2;
end;
%mend;
data WANT;
set asset_data;
%prx(/^066957[\d{1}]/,"0669578");
%prx(/^B26605[\d{1}]/,"B266056");
/*etc.*/
run;
Thank you.
Come on. This doesn;t make sense.
Your RegEx in
data WANT;
set asset_data;
%prx(/^066957[\d{1}]/,"0669578");
%prx(/^B26605[\d{1}]/,"B266056");
/*etc.*/
run;
is static.
The result will be the same for all records in asset_data as the records are not used.
Don't pipe data through macros. Use datasets to hold data.
You could make a macro that you tell it WHICH records you want.
%macro process
(pattern_dsn= /* Name of Pattern dataset */
,records_dsn= /* Name of Data to Test */
,out_dsn= /* Name of dataset to create */
,first_obs= /* First pattern number */
,n_obs= /* Number of patterns */
);
data subset ;
set &pattern_dsn (firstobs=&first_obs obs=&n_obs);
run;
....
%mend ;
As to the process of building Reg Ex strings from the patterns do that with DATA step code and NOT with macro code.
Macro processing happens before your code is compiled and executed. So it is not suitable for what you want to do.
Have a look at Proc FCMP, this allows you to write your own functions to be used in the DATA Step.
OK, this sounds promising.
So, how would I get two variables (i.e. "pattern" and "serial") from patterns_data in order to look for patterns in serial_number (which is located in asset_data)?
b=prxparse(/^pattern[\d{1}]/);
if prxmatch(b,serial_number)>0 then do;
check=1;
if (length(serial) = length(serial_number)) then check=2;
end;
This is one messy post.
So you want to generate matching code from the smaller pattern table it seems.
If only you gave proper data in/ data out examples.
Like this?
data PATTERNS;
PATTERN='066957 '; SERIAL ='0669578 '; output;
PATTERN='B26605 '; SERIAL ='B26056 '; output;
PATTERN='0030811'; SERIAL ='00308110'; output;
run;
data HAVE;
SERIAL_NUMBER ='0669578 '; output;
SERIAL_NUMBER ='B266056 '; output;
SERIAL_NUMBER ='0038110'; output;
run;
data _null_;
retain CODE;
length CODE $32000;
set PATTERNS;
CODE=cats( CODE
, 'if prxmatch("/^', PATTERN, '\d{1}/", SERIAL_NUMBER)>0 then do;'
, ' CHECK=1; '
, ' if (', length(SERIAL), ' = length(SERIAL_NUMBER)) then CHECK=2; '
, 'end;'
);
call symputx('code',CODE);
run;
data WANT;
set HAVE;
&code.;
run;
SERIAL_NUMBER | CHECK |
669578 | 2 |
B266056 | 1 |
38110 | . |
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.