<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Read several text files into a data set in one data step in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912642#M359758</link>
    <description>&lt;P&gt;The purpose of this is code analysis. And to process the code I first need it in a dataset. But I settled on reading the separate lines in separate observations which is way easier and actually has its benefits.&lt;/P&gt;</description>
    <pubDate>Tue, 23 Jan 2024 11:46:36 GMT</pubDate>
    <dc:creator>paul_e</dc:creator>
    <dc:date>2024-01-23T11:46:36Z</dc:date>
    <item>
      <title>Read several text files into a data set in one data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/911970#M359566</link>
      <description>&lt;P&gt;Hello, I'm trying to read in a directory of text files into a data set that would hold all file names in one variable and the entire content of the respective text file in another variable. Let's consider this example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1. A directory on my server /data/textfiles/ has this contents:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;textfile1.txt&lt;/P&gt;
&lt;P&gt;textfile2.txt&lt;/P&gt;
&lt;P&gt;textfile3.txt&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. I would like to create a dataset that looks like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE border="1" width="100%"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="33.333333333333336%"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD width="33.333333333333336%"&gt;fname&lt;/TD&gt;
&lt;TD width="33.333333333333336%"&gt;content&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="33.333333333333336%"&gt;1&lt;/TD&gt;
&lt;TD width="33.333333333333336%"&gt;
&lt;P&gt;textfile1.txt&lt;/P&gt;
&lt;/TD&gt;
&lt;TD width="33.333333333333336%"&gt;This is the entire text in this file. Line breaks might be deleted or replaced with special characters.&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="33.333333333333336%"&gt;2&lt;/TD&gt;
&lt;TD width="33.333333333333336%"&gt;textfile2.txt&lt;/TD&gt;
&lt;TD width="33.333333333333336%"&gt;This is the entire text in this file. Line breaks might be deleted or replaced with special characters.&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="33.333333333333336%"&gt;3&lt;/TD&gt;
&lt;TD width="33.333333333333336%"&gt;textfile3.txt&lt;/TD&gt;
&lt;TD width="33.333333333333336%"&gt;This is the entire text in this file. Line breaks might be deleted or replaced with special characters.&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've tried to program this with dread(), fread(), fget() and so on but haven't been successful.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let directory=/data/textfiles/
data files;
     error_dir = filename(fref,"&amp;amp;directory");
     dir_id = dopen(fref);
     do i = 1 to dnum(dir_id);  
       fname = dread(dir_id,i);
       fpath = cat("&amp;amp;directory./",fname);
       error_file = filename("thefile",fpath);
       file_id = fopen("thefile");
       fread_error = fread(file_id);
       fget_error = fget(file_id,content);
       fclose_error = fclose(file_id);
       output;
     end;
     dclose_error = dclose(dir_id);
     keep fname content;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;However, what I'm getting is just the first few characters of each file, in my impression it's always the first line, i. e. line breaks are treated as separators and fget() only takes the first column from each opened file. The documentation for fget() is pretty thin and I don't see how to change the way data are written to the dataset from the file.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jan 2024 15:46:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/911970#M359566</guid>
      <dc:creator>paul_e</dc:creator>
      <dc:date>2024-01-18T15:46:14Z</dc:date>
    </item>
    <item>
      <title>Re: Read several text files into a data set in one data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/911986#M359574</link>
      <description>&lt;P&gt;What exactly do&amp;nbsp; you mean by " entire content of the respective text file"? How much text do you actually expect in the entire content? SAS variables are limited in size.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Fread is going to treat file line delimiters, such as line feed&amp;nbsp; or carriage return depending on file operating system, as end of record. Which operating system created the text files. You may be able to "trick" SAS into treating some line delimiters as not being one but is very file dependent. What would be so wrong about having multiple observations for each file as long as all the text is there?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What exactly do you expect to do with the resulting data set? That much text in a single variable seems like you may be looking at something more like the SAS Enterprise Miner for text analysis than basic data step approaches.&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jan 2024 16:53:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/911986#M359574</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-01-18T16:53:52Z</dc:date>
    </item>
    <item>
      <title>Re: Read several text files into a data set in one data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912000#M359582</link>
      <description>&lt;P&gt;If you want to read the file as BINARY instead of TEXT then change&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;the FILENAME() function call to set attributes.&amp;nbsp; You might try RECFM=N.&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;error_file_rc = filename("thefile",fpath,,'RECFM=N');&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Or perhaps RECFM=F and LRECL=32767 since that is the maximum number of bytes you can store in a single variable.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;error_file_rc = filename("thefile",fpath,,'RECFM=F LRECL=32767');&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or change the FOPEN() function call:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;file_id = fopen("thefile",'I',32767,'B');&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 18 Jan 2024 18:12:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912000#M359582</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-01-18T18:12:24Z</dc:date>
    </item>
    <item>
      <title>Re: Read several text files into a data set in one data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912003#M359583</link>
      <description>&lt;P&gt;If you want to read all of the files in a directory there is no need to get so complicated.&lt;/P&gt;
&lt;P&gt;Basic INFILE/INPUT statements will do that.&lt;/P&gt;
&lt;P&gt;Since you say they are text files then read them as LINES.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  length fileno 8 fname filename $200 line 8 content $32767 truncover;
  infile '/data/textfiles/*'  filename=filename ;
  input @;
  fname = scan(filename,-1,'/');
  if fname ne lag(fname) then do; 
     fileno+1; line=0;
  end;
  line+1;
  input content $char32767. ;
run;
  &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 18 Jan 2024 18:18:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912003#M359583</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-01-18T18:18:17Z</dc:date>
    </item>
    <item>
      <title>Re: Read several text files into a data set in one data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912008#M359584</link>
      <description>&lt;P&gt;What are you intending to use this data set for? If we understood your complete use case perhaps we could suggest a better solution.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jan 2024 19:06:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912008#M359584</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2024-01-18T19:06:42Z</dc:date>
    </item>
    <item>
      <title>Re: Read several text files into a data set in one data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912176#M359627</link>
      <description>&lt;P&gt;The use case here is that I'd like to handle code files within SAS, for instance isolate single data steps in the code, which is way more difficult if the code is stored in separate rows. But I can see that this is probably not feasible with the variable length limit anyway. Thanks for your answers!&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jan 2024 13:33:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912176#M359627</guid>
      <dc:creator>paul_e</dc:creator>
      <dc:date>2024-01-19T13:33:37Z</dc:date>
    </item>
    <item>
      <title>Re: Read several text files into a data set in one data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912214#M359634</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/356206"&gt;@paul_e&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;The use case here is that I'd like to handle code files within SAS, for instance isolate single data steps in the code, which is way more difficult if the code is stored in separate rows. But I can see that this is probably not feasible with the variable length limit anyway. Thanks for your answers!&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Still not a clear description of what "the entire content of the respective text file" would be BUT it would be much harder to even determine what a single Data step or other proc would be in such a mess.&lt;/P&gt;
&lt;P&gt;If your code is "reasonably structured", meaning that a data step starts on a line with Data and the step ends with a line consisting of Run; (or a label and run;) or a Procedure starts with Proc and ends with Run;&amp;nbsp; then reading the file line by line, adding a line number variable it would be easy to use a data step to extract a data step or procedure, or add a flag variable to indicate related lines.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example (dummy code):&lt;/P&gt;
&lt;PRE&gt;data mycodefiles;
   infile "path/*.sas" FILENAME = readfile &amp;lt;other infile options such as and EOV&amp;gt;;
  input line $100.; (or what seems likely as your longest code line);
  retain codegroup;
  if indexw (lowcase(line),'proc')&amp;gt;1 or strip(lowcase(line))=: 'data' then codegroup+1;
run;&lt;/PRE&gt;
&lt;P&gt;details for handling comments and such needed and perhaps individuals search terms may be required.&lt;/P&gt;
&lt;P&gt;Cation: this sort of "find code" is likely inappropriate for MACRO definitions.&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jan 2024 16:59:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912214#M359634</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-01-19T16:59:10Z</dc:date>
    </item>
    <item>
      <title>Re: Read several text files into a data set in one data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912369#M359674</link>
      <description>&lt;P&gt;I agree with &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt; that your use case of "handle code files" and isolate DATA steps within SAS is still unclear. What would you do with an isolated DATA step? I find SAS macros a good way of isolating common functionality in SAS so it can be easily repeated. An example of this would be importing or exporting CSV files. &amp;nbsp; &lt;/P&gt;</description>
      <pubDate>Sat, 20 Jan 2024 22:45:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912369#M359674</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2024-01-20T22:45:21Z</dc:date>
    </item>
    <item>
      <title>Re: Read several text files into a data set in one data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912642#M359758</link>
      <description>&lt;P&gt;The purpose of this is code analysis. And to process the code I first need it in a dataset. But I settled on reading the separate lines in separate observations which is way easier and actually has its benefits.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jan 2024 11:46:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-several-text-files-into-a-data-set-in-one-data-step/m-p/912642#M359758</guid>
      <dc:creator>paul_e</dc:creator>
      <dc:date>2024-01-23T11:46:36Z</dc:date>
    </item>
  </channel>
</rss>

