BookmarkSubscribeRSS Feed
kjohnsonm
Lapis Lazuli | Level 10

Hello,

I have a need to loop through a set of text files and pattern match with RegEx and based on positive hits, I want a simple output of what files contain at least one match of my expression. I have tested my expression against a few RegEx websites and it seems to pass my quality tests. What I don't know how to do is use a “RegEx” search on a file in SAS. Can anyone give me some starter pointers? I am an experienced programmer but new this last spring to SAS. I am willing to read but to date am not sure what function to even be looking at, …or combination of functions. I guess I am a little lost on what "proc import" to use, the files may or may not be fixed, tab, stanza, bla, bla… I just want to read them a line at a time and check for a match once found can skip out of the file and mark it as match found.   TIA.  -Keith

9 REPLIES 9
ballardw
Super User

There are a number of SAS functions but loot at PRXMATCH.

 

You may also want to look at some reading tricks such as using _infil_ instead of trying to read into variables, possibly wildcard filenaming and the Filename option on the Infile statement.

Reeza
Super User
You may want to look at system commands that you can pass to OS via SAS instead, that may be more efficient. This would be via the X or SYSEXEC command.
Haikuo
Onyx | Level 15

Here is something may get you started (Windows version), tweak it to accommodate your environment.

 

 

filename x pipe 'dir \\yourfolder\*.txt /s /b'; /*This is pipe in all of the dir result into fileref:x*/

data want;

infile x truncover;

input fname $100.;/*this is to get the text file name*/

infile dummy filevar=fname end=last;

do while(not last);

input content $100.; /*this is to get the content of each text file*/

texfilename=fname;

/*here is where to do your RegEx match*/

output;

end;

run;

filename x clear;
kjohnsonm
Lapis Lazuli | Level 10

I finally was able to get the pipe syntax to at least run, I am not 100% sure but it seems to open the file up for viewing.  In my case I eventually want to process every *.txt file on my Network file system so this does not work for my needs but is interesting.    I was able to get something like this to work for me to a degree:

filename myFile 'D:\The_Directory_path\That_file.txt';
data want;
      infile myFile truncover scanover;
      input line   $4096.;
run;
proc contents data=want;
run;

In my logs I get data like this:

747  filename myFile 'D:\The_Directory_path\That_file.txt';
748  data want;
749        infile myFile truncover scanover;
750        input line   $4096.;
751  run;

NOTE: The infile MYFILE is:
      Filename=D:\The_Directory_path\That_file.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=431,
      Last Modified=18Dec2015:09:52:12,
      Create Time=17Nov2015:10:13:37

NOTE: 18 records were read from the infile MYFILE.
      The minimum record length was 0.
      The maximum record length was 50.
NOTE: The data set WORK.WANT has 18 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.05 seconds
      cpu time            0.03 seconds

752  proc contents data=want;
753  run;

NOTE: PROCEDURE CONTENTS used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

 

 

 

Does anyone know how to trap Max/Min, and the time date created or modified in macros vars?  I could use the metadata  🙂   I of course already have the file name and path in my program that this eventually fits into...  TIA.  -KJ

FreelanceReinh
Jade | Level 19

Excellent advice from both, @ballardw and @Reeza! And a nice, practical example from @Haikuo.

 

If you go the data step route, you could combine @ballardw's tips with @Haikuo's code and, e.g., use the automatic variable _INFILE_ rather than a newly created variable CONTENT.

 

Example 10 of the INFILE statement documentation could also be interesting. They read the external file names from a text file, which would be preferable to the pipe approach if the .txt files were located in various different folders.


I was just playing around with the code presented there and verified that it is possible to skip the rest of a (potentially huge) file, as soon as a match was found, and then continue with the next file:

 

do until(prxmatch('/your RegEx expression/', _infile_));
  input;
end;

The above DO loop would replace the do while(^eof); ... end; there.

 

kjohnsonm
Lapis Lazuli | Level 10

Thanks for the replies; I am not on my game today because I cannot seem to make any of these ideas produce except the use

prxparse. I will try again later.  

 

 

I was able to get a simple data set to work with the command

data _null_;
	if _N_=1 then
	do;
	retain PerlExpression;
		pattern="/(?!0)(?!9[0-9][0-9])\d{3}[-.]{1}(?!00)\d{2}/";
		PerlExpression=prxparse(pattern);
/*		put PerlExpression;*/
	end;
	array match[26] $ 105;
	input data_line $100.;
	position=prxmatch(PerlExpression, data_line);
	mA='Matched Position: ';
	mS='String: ';
	mL='Whole line:';
	if position ^= 0 then
	do;
		current_match = substr(data_line, position,5 );
		put mA position mS current_match mL data_line;
	end;
datalines;
123456789012345678901234567890123456789012345678901234567890123 123-45                             0
023456789012345678901234567890123456789012345678901234567890123 123.45                             0
12345678901234567890123456789-01                                                                    
123456789012345678901234567 89-01                                                                   000000000000000000000000000000000000000000000000000000000000003 123 45 0 14:56.456 45:32 ; run;

 

kjohnsonm
Lapis Lazuli | Level 10

PS

I tried this:

 

proc print data=dictionary.tables;
run;

 

This errors because I am not using a Libref.

Haikuo
Onyx | Level 15
/*
You could try this
I have limited to 10 obs, as sometimes it takes long to run if 
you have many tables
*/

proc print data=sashelp.vtable(obs=10);
run;

/*
The reason yours does not work is that except one occasion, 
SAS library name is limited to 8 characters, while 'dictionary'
is 10. The following is that one occasion
*/
proc sql inobs=10;
	select * from dictionary.tables
	;
quit;

kjohnsonm
Lapis Lazuli | Level 10

...and in my example this where limits me to just my data set want.  ...and anything i might have in my work libname.  Thanks for the knowledge.  thats helps a lot!   -KJ

 

proc sql inobs=10;
    select * from dictionary.tables
    where libname='WORK' or memname='WANT';
    ;
quit;


proc print data=sashelp.vtable;
where libname='WORK' or memname='WANT';
run;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2389 views
  • 2 likes
  • 5 in conversation