Hello,
I have a need to loop through a set of text files and pattern match with RegEx and based on positive hits, I want a simple output of what files contain at least one match of my expression. I have tested my expression against a few RegEx websites and it seems to pass my quality tests. What I don't know how to do is use a “RegEx” search on a file in SAS. Can anyone give me some starter pointers? I am an experienced programmer but new this last spring to SAS. I am willing to read but to date am not sure what function to even be looking at, …or combination of functions. I guess I am a little lost on what "proc import" to use, the files may or may not be fixed, tab, stanza, bla, bla… I just want to read them a line at a time and check for a match once found can skip out of the file and mark it as match found. TIA. -Keith
There are a number of SAS functions but loot at PRXMATCH.
You may also want to look at some reading tricks such as using _infil_ instead of trying to read into variables, possibly wildcard filenaming and the Filename option on the Infile statement.
Here is something may get you started (Windows version), tweak it to accommodate your environment.
filename x pipe 'dir \\yourfolder\*.txt /s /b'; /*This is pipe in all of the dir result into fileref:x*/ data want; infile x truncover; input fname $100.;/*this is to get the text file name*/ infile dummy filevar=fname end=last; do while(not last); input content $100.; /*this is to get the content of each text file*/ texfilename=fname; /*here is where to do your RegEx match*/ output; end; run; filename x clear;
I finally was able to get the pipe syntax to at least run, I am not 100% sure but it seems to open the file up for viewing. In my case I eventually want to process every *.txt file on my Network file system so this does not work for my needs but is interesting. I was able to get something like this to work for me to a degree:
filename myFile 'D:\The_Directory_path\That_file.txt';
data want;
infile myFile truncover scanover;
input line $4096.;
run;
proc contents data=want;
run;
In my logs I get data like this:
747 filename myFile 'D:\The_Directory_path\That_file.txt';
748 data want;
749 infile myFile truncover scanover;
750 input line $4096.;
751 run;
NOTE: The infile MYFILE is:
Filename=D:\The_Directory_path\That_file.txt,
RECFM=V,LRECL=32767,File Size (bytes)=431,
Last Modified=18Dec2015:09:52:12,
Create Time=17Nov2015:10:13:37
NOTE: 18 records were read from the infile MYFILE.
The minimum record length was 0.
The maximum record length was 50.
NOTE: The data set WORK.WANT has 18 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.03 seconds
752 proc contents data=want;
753 run;
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Does anyone know how to trap Max/Min, and the time date created or modified in macros vars? I could use the metadata 🙂 I of course already have the file name and path in my program that this eventually fits into... TIA. -KJ
Excellent advice from both, @ballardw and @Reeza! And a nice, practical example from @Haikuo.
If you go the data step route, you could combine @ballardw's tips with @Haikuo's code and, e.g., use the automatic variable _INFILE_ rather than a newly created variable CONTENT.
Example 10 of the INFILE statement documentation could also be interesting. They read the external file names from a text file, which would be preferable to the pipe approach if the .txt files were located in various different folders.
I was just playing around with the code presented there and verified that it is possible to skip the rest of a (potentially huge) file, as soon as a match was found, and then continue with the next file:
do until(prxmatch('/your RegEx expression/', _infile_));
input;
end;
The above DO loop would replace the do while(^eof); ... end; there.
Thanks for the replies; I am not on my game today because I cannot seem to make any of these ideas produce except the use
prxparse. I will try again later.
I was able to get a simple data set to work with the command
data _null_;
if _N_=1 then
do;
retain PerlExpression;
pattern="/(?!0)(?!9[0-9][0-9])\d{3}[-.]{1}(?!00)\d{2}/";
PerlExpression=prxparse(pattern);
/* put PerlExpression;*/
end;
array match[26] $ 105;
input data_line $100.;
position=prxmatch(PerlExpression, data_line);
mA='Matched Position: ';
mS='String: ';
mL='Whole line:';
if position ^= 0 then
do;
current_match = substr(data_line, position,5 );
put mA position mS current_match mL data_line;
end;
datalines;
123456789012345678901234567890123456789012345678901234567890123 123-45 0
023456789012345678901234567890123456789012345678901234567890123 123.45 0
12345678901234567890123456789-01
123456789012345678901234567 89-01
000000000000000000000000000000000000000000000000000000000000003 123 45 0
14:56.456
45:32
;
run;
PS
I tried this:
proc print data=dictionary.tables;
run;
This errors because I am not using a Libref.
/*
You could try this
I have limited to 10 obs, as sometimes it takes long to run if
you have many tables
*/
proc print data=sashelp.vtable(obs=10);
run;
/*
The reason yours does not work is that except one occasion,
SAS library name is limited to 8 characters, while 'dictionary'
is 10. The following is that one occasion
*/
proc sql inobs=10;
select * from dictionary.tables
;
quit;
...and in my example this where limits me to just my data set want. ...and anything i might have in my work libname. Thanks for the knowledge. thats helps a lot! -KJ
proc sql inobs=10;
select * from dictionary.tables
where libname='WORK' or memname='WANT';
;
quit;
proc print data=sashelp.vtable;
where libname='WORK' or memname='WANT';
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.