- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have a need to loop through a set of text files and pattern match with RegEx and based on positive hits, I want a simple output of what files contain at least one match of my expression. I have tested my expression against a few RegEx websites and it seems to pass my quality tests. What I don't know how to do is use a “RegEx” search on a file in SAS. Can anyone give me some starter pointers? I am an experienced programmer but new this last spring to SAS. I am willing to read but to date am not sure what function to even be looking at, …or combination of functions. I guess I am a little lost on what "proc import" to use, the files may or may not be fixed, tab, stanza, bla, bla… I just want to read them a line at a time and check for a match once found can skip out of the file and mark it as match found. TIA. -Keith
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are a number of SAS functions but loot at PRXMATCH.
You may also want to look at some reading tricks such as using _infil_ instead of trying to read into variables, possibly wildcard filenaming and the Filename option on the Infile statement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is something may get you started (Windows version), tweak it to accommodate your environment.
filename x pipe 'dir \\yourfolder\*.txt /s /b'; /*This is pipe in all of the dir result into fileref:x*/ data want; infile x truncover; input fname $100.;/*this is to get the text file name*/ infile dummy filevar=fname end=last; do while(not last); input content $100.; /*this is to get the content of each text file*/ texfilename=fname; /*here is where to do your RegEx match*/ output; end; run; filename x clear;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I finally was able to get the pipe syntax to at least run, I am not 100% sure but it seems to open the file up for viewing. In my case I eventually want to process every *.txt file on my Network file system so this does not work for my needs but is interesting. I was able to get something like this to work for me to a degree:
filename myFile 'D:\The_Directory_path\That_file.txt';
data want;
infile myFile truncover scanover;
input line $4096.;
run;
proc contents data=want;
run;
In my logs I get data like this:
747 filename myFile 'D:\The_Directory_path\That_file.txt';
748 data want;
749 infile myFile truncover scanover;
750 input line $4096.;
751 run;
NOTE: The infile MYFILE is:
Filename=D:\The_Directory_path\That_file.txt,
RECFM=V,LRECL=32767,File Size (bytes)=431,
Last Modified=18Dec2015:09:52:12,
Create Time=17Nov2015:10:13:37
NOTE: 18 records were read from the infile MYFILE.
The minimum record length was 0.
The maximum record length was 50.
NOTE: The data set WORK.WANT has 18 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.03 seconds
752 proc contents data=want;
753 run;
NOTE: PROCEDURE CONTENTS used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Does anyone know how to trap Max/Min, and the time date created or modified in macros vars? I could use the metadata 🙂 I of course already have the file name and path in my program that this eventually fits into... TIA. -KJ
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Excellent advice from both, @ballardw and @Reeza! And a nice, practical example from @Haikuo.
If you go the data step route, you could combine @ballardw's tips with @Haikuo's code and, e.g., use the automatic variable _INFILE_ rather than a newly created variable CONTENT.
Example 10 of the INFILE statement documentation could also be interesting. They read the external file names from a text file, which would be preferable to the pipe approach if the .txt files were located in various different folders.
I was just playing around with the code presented there and verified that it is possible to skip the rest of a (potentially huge) file, as soon as a match was found, and then continue with the next file:
do until(prxmatch('/your RegEx expression/', _infile_));
input;
end;
The above DO loop would replace the do while(^eof); ... end; there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the replies; I am not on my game today because I cannot seem to make any of these ideas produce except the use
prxparse. I will try again later.
I was able to get a simple data set to work with the command
data _null_;
if _N_=1 then
do;
retain PerlExpression;
pattern="/(?!0)(?!9[0-9][0-9])\d{3}[-.]{1}(?!00)\d{2}/";
PerlExpression=prxparse(pattern);
/* put PerlExpression;*/
end;
array match[26] $ 105;
input data_line $100.;
position=prxmatch(PerlExpression, data_line);
mA='Matched Position: ';
mS='String: ';
mL='Whole line:';
if position ^= 0 then
do;
current_match = substr(data_line, position,5 );
put mA position mS current_match mL data_line;
end;
datalines;
123456789012345678901234567890123456789012345678901234567890123 123-45 0
023456789012345678901234567890123456789012345678901234567890123 123.45 0
12345678901234567890123456789-01 123456789012345678901234567 89-01
000000000000000000000000000000000000000000000000000000000000003 123 45 0
14:56.456
45:32
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PS
I tried this:
proc print data=dictionary.tables;
run;
This errors because I am not using a Libref.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
/*
You could try this
I have limited to 10 obs, as sometimes it takes long to run if
you have many tables
*/
proc print data=sashelp.vtable(obs=10);
run;
/*
The reason yours does not work is that except one occasion,
SAS library name is limited to 8 characters, while 'dictionary'
is 10. The following is that one occasion
*/
proc sql inobs=10;
select * from dictionary.tables
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
...and in my example this where limits me to just my data set want. ...and anything i might have in my work libname. Thanks for the knowledge. thats helps a lot! -KJ
proc sql inobs=10;
select * from dictionary.tables
where libname='WORK' or memname='WANT';
;
quit;
proc print data=sashelp.vtable;
where libname='WORK' or memname='WANT';
run;