Solved: Re: How to import 100 TXT files to 100 Rows in one dataset?

DanielDor · Posted 03-22-2017 01:41 PM

Greetings,

1. I would like to import 100 TXT files to 100 rows in one dataset.

2. To read 1 file to 1 row in 1 Dataset, I can use the following code:

data test;
length text $32767;
retain text '';
infile "C:\Test.txt" flowover dlmstr='//' end=last;
input;
text=cats(text,_infile_);
if last then output;
run;

3. However, I would like to run import 100 TXT files (from the same directory) to 100 rows in one dataset.

Thanks in advance!

D

data_null__ · Posted 03-22-2017 02:15 PM

This use wildcard in the INFILE statement.

I added a variable file to show where each record comes from. DLMSTR doesn't seem to do anything. Are you trying to keep the TERMSTR on the records in TEXT?

data file;
   length file filenm $64;
   retain file;
   infile '~/t00*.dat' eov=eov end=end dlmstr='//' flowover filename=filenm;
   length text $32767;
   retain text;
   input;
   if _n_ eq 1 or eov then do;      
      if eov then output;
      file=filenm;
      eov=0;
      call missing(text);
      end;
   text=cats(text,_infile_);
   if end then output;
   run;

View solution in original post

ballardw · Posted 03-22-2017 01:51 PM

Do your txt files have column headings?

Are you absolutely sure that 32767 characters will hold the entire contents of each file?

did you try?

infile "C:\path:\*.txt" flowover dlmstr='//' end=last;

where path is the path to the folder

This will attempt to read all of the files with txt as the extension in the folder. If there is a more specific pattern to the file names you need to extract then more details on the names is needed.

DanielDor · Posted 03-22-2017 03:13 PM

Thanks a lot for your fast reply..Changing the script the way you suggested, imported all the text files to the final row of the dataset, so it is not exactly what I looked for. Thanks anyway for your great explanation 🙂

data_null__ · Posted 03-22-2017 02:15 PM

This use wildcard in the INFILE statement.

I added a variable file to show where each record comes from. DLMSTR doesn't seem to do anything. Are you trying to keep the TERMSTR on the records in TEXT?

data file;
   length file filenm $64;
   retain file;
   infile '~/t00*.dat' eov=eov end=end dlmstr='//' flowover filename=filenm;
   length text $32767;
   retain text;
   input;
   if _n_ eq 1 or eov then do;      
      if eov then output;
      file=filenm;
      eov=0;
      call missing(text);
      end;
   text=cats(text,_infile_);
   if end then output;
   run;

DanielDor · Posted 03-22-2017 03:11 PM

Thanks so much! This is working just great 🙂 Appreciate your fast reply 🙂

data_null__ · Posted 03-22-2017 04:22 PM

This version reads the file 32767 bytes at a time TERMSTRs and all into an array of TEXT variables. This will capture all text for files that may be larger than 32767. Adjust array diminsion as needed.

data file;
   length file filenm $64;
   retain file;
   array text[10] $32767;
   retain text;
   infile '~/*.dat' eov=eov end=end filename=filenm recfm=F lrecl=32767;
   input;
   *list;
   if _n_ eq 1 or eov then do;      
      if eov then output;
      file=filenm;
      eov=0;
      call missing(of i text[*]);
      end;
   i + 1;
   text[i]=_infile_;
   if end then output;
   run;

DanielDor · Posted 03-23-2017 02:56 AM

Thanks! That also working great. As you said, it is working for texts above 32K chars.

rogerjdeangelis · Posted 03-23-2017 11:06 AM

/* T1003210 load a file in one character variable(mystring) of length 128,004.

Associated problem. We really need to what the op wants to do with the 
array of 32k chunks. 


I load a file in one character variable(mystring) of length 128,004.

This type of problem is best solved with Perl or Python.
I present an R solution because I am a little more skilled in R.

inspired
https://goo.gl/eOu829
https://communities.sas.com/t5/SAS-Procedures/How-to-import-100-TXT-files-to-100-Rows-in-one-dataset/m-p/343429

HAVE (A file with two 64000 byte records)
=========================================

FILE:  d:/txt/fyl1

RECORD 1:    64002   bytes  first recrord  (crlf once)
RECORD 2:   128004   bytes  full file (crlf twice)

WANT ( load the the file both records into one character variable of length 128,004 bytes)
==========================================================================================

Pull some substrings out

   1] "FILE LENGTH"
   1] 128004

   1] "substr(mystring,60001,60010)"
   1] "1234567890"

   1] "substr(mystring,30001,30010)"
   1] "1234567890"

*                _                  _       _
 _ __ ___   __ _| | _____        __| | __ _| |_ __ _
| '_ ` _ \ / _` | |/ / _ \_____ / _` |/ _` | __/ _` |
| | | | | | (_| |   <  __/_____| (_| | (_| | || (_| |
|_| |_| |_|\__,_|_|\_\___|      \__,_|\__,_|\__\__,_|

;

data _null_;
  length txt1 txt2 $32000;
  txt1=repeat('1234567890',3199);
  txt2=txt1;
  do fyls=1 to 1;
    fylvar=cats('d:/txt/fyl',put(fyls,1.));
    file dummy filevar=fylvar lrecl=64000 recfm=v;
    put txt1 +(-1) txt2;
    put txt1 +(-1) txt2;
  end;
run;quit;

NOTE: 2 records were written to the file DUMMY.
      The minimum record length was 64000.

*____
|  _ \
| |_) |
|  _ <
|_| \_\

;


%utl_submit_r64('
library(readr);
mystring <- read_file("d:/txt/fyl1");
"FILE LENGTH";
nchar(mystring);
"substr(mystring,60001,60010)";
substr(mystring,60001,60010);
"substr(mystring,30001,30010)";
substr(mystring,30001,30010);
');


1] "FILE LENGTH"
1] 128004

1] "substr(mystring,60001,60010)"
1] "1234567890"

1] "substr(mystring,30001,30010)"
1] "1234567890"

Register Today!

SAS Training: Just a Click Away