BookmarkSubscribeRSS Feed
BCNAV
Quartz | Level 8

I'm thinking this is easy....but for me not so much Smiley Tongue

 

If I have a folder of .txt files that have the same variables and delimiters, I would like to append them all into one SAS dataset. Is there a way to pattern match (*.txt) and use proc append after reading in?

 

E.g.,

 

Text files:

 

1.txt

2.txt

3.txt...

n.txt

 

read all *.txt in and then append them all.

 

Cheers!

12 REPLIES 12
error_prone
Barite | Level 11
You can use a wildcard in the infile-statement:

data all_files;
infile "folder/*.txt" further options;
Input ....;
Run;
BCNAV
Quartz | Level 8

I have this nearly figured out. I can get all the data in, the only issue is that every time a new txt file is read in (even the first one), a blank line is inserted.  Here is the code thus far; I am not including the format coding for the variables nor all the variables as it would take too much space:

 

* Use wildcard in input;
	infile "H:\2007\*.txt"  FIRSTOBS=1 eov=eov missover filename=filename ENCODING="WLATIN1" DLM='09'x TRUNCOVER DSD;

	* Input first record and hold line;
	input@;
 
	* Check if this is the first record or the first record in a new file;
	* If it is, replace the filename with the new file name and move to next line;
	if _n_ eq 1 or eov then do;
		txt_file_name = scan(filename, -1, "\");
		eov=0;
	end;
	
	else input
		FlightId             : $CHAR25.
                AirlineCode      : $CHAR3.
                FlightNumber   : $CHAR7.
                SchedDepApt   : $CHAR4.
                SchedArrApt     : $CHAR4.

                many other variables
;
run;

If I can get rid of the blank line that starts each input of the txt files that would get it all!

 

 

thx

Reeza
Super User

Do your files have a header row? Where is the ‘empty line’ coming from?

BCNAV
Quartz | Level 8

The first line is a header line with the variable names in it. I had thought FIRSTOBS=2 would fix it, but it did not, hence I chose FIRSTOBS=1 to test to see if it made a difference. It does. FIRSTOBS=2 will cause it to read the second valid record, which is not good. Hence I am back to FIRSTOBS=1.

 

The first row in every text file is the header row with the name of the variables. It should be ignored. If I use SAS EG to read it in, it uses FIRSTOBS=2 and this ignores the header row properly. EG uses:

INFILE 'H:\2007\NavCanReport_20171002.txt'
        LRECL=32767
        FIRSTOBS=2
        ENCODING="WLATIN1"
        DLM='09'x
        MISSOVER
        DSD ;
    INPUT
        FlightId         : $CHAR25.

lots of vars

This works fine. Obviously I would not know LRECL when grabbing a bunch of text files. Seems like there is something wrong with the code I am using here as it seems to always add a blank line upon reading each file. There are no blank lines in the txt files. Likely my:

 

 

infile "H:\2007\*.txt"  FIRSTOBS=2 eov=eov missover filename=filename ENCODING="WLATIN1" DLM='09'x TRUNCOVER DSD;

	* Input first record and hold line;
	input@;
 
	* Check if this is the first record or the first record in a new file;
	* If it is, replace the filename with the new file name and move to next line;
	if _n_ eq 1 or eov then do;
		txt_file_name = scan(filename, -1, "\");
		eov=0;
	end;
	
	else input

needs some fixing, but this is just code from other posts here. Perhaps the "move to next line" stuff above? That implies to keep the first row, which I do not want to do as it is a header of variable names.

 

Reeza
Super User

No, the code is designed to account for a header row. Follow it exactly. Do you have the two INPUT statements?

Reeza
Super User

You have both MISSOVER and TRUNCOVER, pick one. 

Remove the FIRSTOBS and other options and see if that works. 

BCNAV
Quartz | Level 8

@Reeza

 

The other options I am using are needed as SAS will crash with error codes. They are options that EG adds when manually adding the txt files (we could have hundreds hence the need for some code to bring this in no matter the number). I also have two input statements just as listed before. The blank initial line is still there for some reason. All the data flows properly after that and for each file. Just a blank line is inserted before each new file is read.

error_prone
Barite | Level 11
The option firstobs affects the first file only. Just remove it. Use EOV=newFile in infile statement and
if not newFile;
afterwards.
BCNAV
Quartz | Level 8

@error_prone

 

What exactly do you mean...I have:

 

data all_data;
     length
              specify length of all vars
     format
              specify format of all vars
     informat
              commands

*make sure variables to store file name are long enough;
	length filename txt_file_name $256;

	*keep file name from record to record;
	retain txt_file_name;

 	* Use wildcard in input;
	infile "H:\2007\*.txt" eov=eov filename=filename ENCODING="WLATIN1" DLM='09'x MISSOVER DSD;

	* Input first record and hold line;
	input@;
 
	* Check if this is the first record or the first record in a new file;
	* If it is, replace the filename with the new file name and move to next line;
	if _n_ eq 1 or eov then do;
		txt_file_name = scan(filename, -1, "\");
		eov=0;
	end;
	
	else input
		FlightId         : $CHAR25.
        AirlineCode      : $CHAR3.
        FlightNumber     : $CHAR7.
        SchedDepApt      : $CHAR4.

many vars to the end;

 

How would I implement your idea?

BCNAV
Quartz | Level 8

So...an update. None of the programming listed prior worked for me...always problems with the headers. What did work perfectly is:

 

infile "\\NCRFP4\TAF\Traffic Forecasting\OAG\Cancellation Data\NavCan\*.txt" eov=eov ENCODING="WLATIN1" DLM='09'x MISSOVER DSD firstobs=2;
	input @;
	if eov then input;
	
	input
              var1
              var 2, for many variables
;
eov=0;
run;

Not sure why the others did not.

 

Tom
Super User Tom
Super User

Your problem is including the resetting of the EOV inside of the THEN clause.  SAS sets it to 1 when it starts a new file, but it never sets it back to 0.  

 

Note that the code that PROC IMPORT and EG Import function generates is not really that good.  Your basic structure of first defining your variables using LENGTH statement is good, but :

 

Do not both attach permanent INFORMATs with an INFORMAT statement and also specify an informat in the INPUT statement. Unless you really really want to read this file using a different informat than you want attached to the variable.

 

SAS does NOT need to have either INFORMATs or FORMATS defined for most variables. SAS already knows how to read and write most numbers and character strings.  Main exception are date/time/datetime values.  You might also need to use COMMA informat if the text file is a report that includes commas and dollar signs in the values of numbers.  

 

The main advantage of using the $CHAR informat (and format) is to preserve leading spaces in the values. Do really WANT to preserve leading spaces in the values of your character variables?  If so I am not sure that you can if you are using the DSD option to read.

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 4596 views
  • 0 likes
  • 4 in conversation