Append All TXT Files in a Folder to a Dataset

Reply
Frequent Contributor
Posts: 80

Append All TXT Files in a Folder to a Dataset

I'm thinking this is easy....but for me not so much Smiley Tongue

 

If I have a folder of .txt files that have the same variables and delimiters, I would like to append them all into one SAS dataset. Is there a way to pattern match (*.txt) and use proc append after reading in?

 

E.g.,

 

Text files:

 

1.txt

2.txt

3.txt...

n.txt

 

read all *.txt in and then append them all.

 

Cheers!

Super User
Posts: 21,546

Re: Append All TXT Files in a Folder to a Dataset

Regular Contributor
Posts: 185

Re: Append All TXT Files in a Folder to a Dataset

You can use a wildcard in the infile-statement:

data all_files;
infile "folder/*.txt" further options;
Input ....;
Run;
Frequent Contributor
Posts: 80

Re: Append All TXT Files in a Folder to a Dataset

Posted in reply to error_prone

I have this nearly figured out. I can get all the data in, the only issue is that every time a new txt file is read in (even the first one), a blank line is inserted.  Here is the code thus far; I am not including the format coding for the variables nor all the variables as it would take too much space:

 

* Use wildcard in input;
	infile "H:\2007\*.txt"  FIRSTOBS=1 eov=eov missover filename=filename ENCODING="WLATIN1" DLM='09'x TRUNCOVER DSD;

	* Input first record and hold line;
	input@;
 
	* Check if this is the first record or the first record in a new file;
	* If it is, replace the filename with the new file name and move to next line;
	if _n_ eq 1 or eov then do;
		txt_file_name = scan(filename, -1, "\");
		eov=0;
	end;
	
	else input
		FlightId             : $CHAR25.
                AirlineCode      : $CHAR3.
                FlightNumber   : $CHAR7.
                SchedDepApt   : $CHAR4.
                SchedArrApt     : $CHAR4.

                many other variables
;
run;

If I can get rid of the blank line that starts each input of the txt files that would get it all!

 

 

thx

Super User
Posts: 21,546

Re: Append All TXT Files in a Folder to a Dataset

Do your files have a header row? Where is the ‘empty line’ coming from?

Frequent Contributor
Posts: 80

Re: Append All TXT Files in a Folder to a Dataset

The first line is a header line with the variable names in it. I had thought FIRSTOBS=2 would fix it, but it did not, hence I chose FIRSTOBS=1 to test to see if it made a difference. It does. FIRSTOBS=2 will cause it to read the second valid record, which is not good. Hence I am back to FIRSTOBS=1.

 

The first row in every text file is the header row with the name of the variables. It should be ignored. If I use SAS EG to read it in, it uses FIRSTOBS=2 and this ignores the header row properly. EG uses:

INFILE 'H:\2007\NavCanReport_20171002.txt'
        LRECL=32767
        FIRSTOBS=2
        ENCODING="WLATIN1"
        DLM='09'x
        MISSOVER
        DSD ;
    INPUT
        FlightId         : $CHAR25.

lots of vars

This works fine. Obviously I would not know LRECL when grabbing a bunch of text files. Seems like there is something wrong with the code I am using here as it seems to always add a blank line upon reading each file. There are no blank lines in the txt files. Likely my:

 

 

infile "H:\2007\*.txt"  FIRSTOBS=2 eov=eov missover filename=filename ENCODING="WLATIN1" DLM='09'x TRUNCOVER DSD;

	* Input first record and hold line;
	input@;
 
	* Check if this is the first record or the first record in a new file;
	* If it is, replace the filename with the new file name and move to next line;
	if _n_ eq 1 or eov then do;
		txt_file_name = scan(filename, -1, "\");
		eov=0;
	end;
	
	else input

needs some fixing, but this is just code from other posts here. Perhaps the "move to next line" stuff above? That implies to keep the first row, which I do not want to do as it is a header of variable names.

 

Super User
Posts: 21,546

Re: Append All TXT Files in a Folder to a Dataset

No, the code is designed to account for a header row. Follow it exactly. Do you have the two INPUT statements?

Super User
Posts: 21,546

Re: Append All TXT Files in a Folder to a Dataset

You have both MISSOVER and TRUNCOVER, pick one. 

Remove the FIRSTOBS and other options and see if that works. 

Frequent Contributor
Posts: 80

Re: Append All TXT Files in a Folder to a Dataset

@Reeza

 

The other options I am using are needed as SAS will crash with error codes. They are options that EG adds when manually adding the txt files (we could have hundreds hence the need for some code to bring this in no matter the number). I also have two input statements just as listed before. The blank initial line is still there for some reason. All the data flows properly after that and for each file. Just a blank line is inserted before each new file is read.

Regular Contributor
Posts: 185

Re: Append All TXT Files in a Folder to a Dataset

The option firstobs affects the first file only. Just remove it. Use EOV=newFile in infile statement and
if not newFile;
afterwards.
Frequent Contributor
Posts: 80

Re: Append All TXT Files in a Folder to a Dataset

Posted in reply to error_prone

@error_prone

 

What exactly do you mean...I have:

 

data all_data;
     length
              specify length of all vars
     format
              specify format of all vars
     informat
              commands

*make sure variables to store file name are long enough;
	length filename txt_file_name $256;

	*keep file name from record to record;
	retain txt_file_name;

 	* Use wildcard in input;
	infile "H:\2007\*.txt" eov=eov filename=filename ENCODING="WLATIN1" DLM='09'x MISSOVER DSD;

	* Input first record and hold line;
	input@;
 
	* Check if this is the first record or the first record in a new file;
	* If it is, replace the filename with the new file name and move to next line;
	if _n_ eq 1 or eov then do;
		txt_file_name = scan(filename, -1, "\");
		eov=0;
	end;
	
	else input
		FlightId         : $CHAR25.
        AirlineCode      : $CHAR3.
        FlightNumber     : $CHAR7.
        SchedDepApt      : $CHAR4.

many vars to the end;

 

How would I implement your idea?

Frequent Contributor
Posts: 80

Re: Append All TXT Files in a Folder to a Dataset

[ Edited ]

So...an update. None of the programming listed prior worked for me...always problems with the headers. What did work perfectly is:

 

infile "\\NCRFP4\TAF\Traffic Forecasting\OAG\Cancellation Data\NavCan\*.txt" eov=eov ENCODING="WLATIN1" DLM='09'x MISSOVER DSD firstobs=2;
	input @;
	if eov then input;
	
	input
              var1
              var 2, for many variables
;
eov=0;
run;

Not sure why the others did not.

 

Super User
Super User
Posts: 7,401

Re: Append All TXT Files in a Folder to a Dataset

Your problem is including the resetting of the EOV inside of the THEN clause.  SAS sets it to 1 when it starts a new file, but it never sets it back to 0.  

 

Note that the code that PROC IMPORT and EG Import function generates is not really that good.  Your basic structure of first defining your variables using LENGTH statement is good, but :

 

Do not both attach permanent INFORMATs with an INFORMAT statement and also specify an informat in the INPUT statement. Unless you really really want to read this file using a different informat than you want attached to the variable.

 

SAS does NOT need to have either INFORMATs or FORMATS defined for most variables. SAS already knows how to read and write most numbers and character strings.  Main exception are date/time/datetime values.  You might also need to use COMMA informat if the text file is a report that includes commas and dollar signs in the values of numbers.  

 

The main advantage of using the $CHAR informat (and format) is to preserve leading spaces in the values. Do really WANT to preserve leading spaces in the values of your character variables?  If so I am not sure that you can if you are using the DSD option to read.

 

 

Ask a Question
Discussion stats
  • 12 replies
  • 136 views
  • 0 likes
  • 4 in conversation