BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
paul_e
Obsidian | Level 7

Hello, I'm trying to read in a directory of text files into a data set that would hold all file names in one variable and the entire content of the respective text file in another variable. Let's consider this example:

 

1. A directory on my server /data/textfiles/ has this contents:

 

textfile1.txt

textfile2.txt

textfile3.txt

 

2. I would like to create a dataset that looks like this:

 

  fname content
1

textfile1.txt

This is the entire text in this file. Line breaks might be deleted or replaced with special characters.
2 textfile2.txt This is the entire text in this file. Line breaks might be deleted or replaced with special characters.
3 textfile3.txt This is the entire text in this file. Line breaks might be deleted or replaced with special characters.

 

I've tried to program this with dread(), fread(), fget() and so on but haven't been successful. 

 

%let directory=/data/textfiles/
data files;
     error_dir = filename(fref,"&directory");
     dir_id = dopen(fref);
     do i = 1 to dnum(dir_id);  
       fname = dread(dir_id,i);
       fpath = cat("&directory./",fname);
       error_file = filename("thefile",fpath);
       file_id = fopen("thefile");
       fread_error = fread(file_id);
       fget_error = fget(file_id,content);
       fclose_error = fclose(file_id);
       output;
     end;
     dclose_error = dclose(dir_id);
     keep fname content;
run;

However, what I'm getting is just the first few characters of each file, in my impression it's always the first line, i. e. line breaks are treated as separators and fget() only takes the first column from each opened file. The documentation for fget() is pretty thin and I don't see how to change the way data are written to the dataset from the file.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@paul_e wrote:

The use case here is that I'd like to handle code files within SAS, for instance isolate single data steps in the code, which is way more difficult if the code is stored in separate rows. But I can see that this is probably not feasible with the variable length limit anyway. Thanks for your answers!


Still not a clear description of what "the entire content of the respective text file" would be BUT it would be much harder to even determine what a single Data step or other proc would be in such a mess.

If your code is "reasonably structured", meaning that a data step starts on a line with Data and the step ends with a line consisting of Run; (or a label and run;) or a Procedure starts with Proc and ends with Run;  then reading the file line by line, adding a line number variable it would be easy to use a data step to extract a data step or procedure, or add a flag variable to indicate related lines.

 

For example (dummy code):

data mycodefiles;
   infile "path/*.sas" FILENAME = readfile <other infile options such as and EOV>;
  input line $100.; (or what seems likely as your longest code line);
  retain codegroup;
  if indexw (lowcase(line),'proc')>1 or strip(lowcase(line))=: 'data' then codegroup+1;
run;

details for handling comments and such needed and perhaps individuals search terms may be required.

Cation: this sort of "find code" is likely inappropriate for MACRO definitions.

View solution in original post

8 REPLIES 8
ballardw
Super User

What exactly do  you mean by " entire content of the respective text file"? How much text do you actually expect in the entire content? SAS variables are limited in size.

 

Fread is going to treat file line delimiters, such as line feed  or carriage return depending on file operating system, as end of record. Which operating system created the text files. You may be able to "trick" SAS into treating some line delimiters as not being one but is very file dependent. What would be so wrong about having multiple observations for each file as long as all the text is there?

 

What exactly do you expect to do with the resulting data set? That much text in a single variable seems like you may be looking at something more like the SAS Enterprise Miner for text analysis than basic data step approaches.

Tom
Super User Tom
Super User

If you want to read the file as BINARY instead of TEXT then change

 

the FILENAME() function call to set attributes.  You might try RECFM=N. 

error_file_rc = filename("thefile",fpath,,'RECFM=N');

Or perhaps RECFM=F and LRECL=32767 since that is the maximum number of bytes you can store in a single variable.

error_file_rc = filename("thefile",fpath,,'RECFM=F LRECL=32767');

 

Or change the FOPEN() function call:

file_id = fopen("thefile",'I',32767,'B');
Tom
Super User Tom
Super User

If you want to read all of the files in a directory there is no need to get so complicated.

Basic INFILE/INPUT statements will do that.

Since you say they are text files then read them as LINES.

data want;
  length fileno 8 fname filename $200 line 8 content $32767 truncover;
  infile '/data/textfiles/*'  filename=filename ;
  input @;
  fname = scan(filename,-1,'/');
  if fname ne lag(fname) then do; 
     fileno+1; line=0;
  end;
  line+1;
  input content $char32767. ;
run;
  
SASKiwi
PROC Star

What are you intending to use this data set for? If we understood your complete use case perhaps we could suggest a better solution. 

paul_e
Obsidian | Level 7

The use case here is that I'd like to handle code files within SAS, for instance isolate single data steps in the code, which is way more difficult if the code is stored in separate rows. But I can see that this is probably not feasible with the variable length limit anyway. Thanks for your answers!

ballardw
Super User

@paul_e wrote:

The use case here is that I'd like to handle code files within SAS, for instance isolate single data steps in the code, which is way more difficult if the code is stored in separate rows. But I can see that this is probably not feasible with the variable length limit anyway. Thanks for your answers!


Still not a clear description of what "the entire content of the respective text file" would be BUT it would be much harder to even determine what a single Data step or other proc would be in such a mess.

If your code is "reasonably structured", meaning that a data step starts on a line with Data and the step ends with a line consisting of Run; (or a label and run;) or a Procedure starts with Proc and ends with Run;  then reading the file line by line, adding a line number variable it would be easy to use a data step to extract a data step or procedure, or add a flag variable to indicate related lines.

 

For example (dummy code):

data mycodefiles;
   infile "path/*.sas" FILENAME = readfile <other infile options such as and EOV>;
  input line $100.; (or what seems likely as your longest code line);
  retain codegroup;
  if indexw (lowcase(line),'proc')>1 or strip(lowcase(line))=: 'data' then codegroup+1;
run;

details for handling comments and such needed and perhaps individuals search terms may be required.

Cation: this sort of "find code" is likely inappropriate for MACRO definitions.

SASKiwi
PROC Star

I agree with @ballardw that your use case of "handle code files" and isolate DATA steps within SAS is still unclear. What would you do with an isolated DATA step? I find SAS macros a good way of isolating common functionality in SAS so it can be easily repeated. An example of this would be importing or exporting CSV files.  

paul_e
Obsidian | Level 7

The purpose of this is code analysis. And to process the code I first need it in a dataset. But I settled on reading the separate lines in separate observations which is way easier and actually has its benefits.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 778 views
  • 2 likes
  • 4 in conversation