I'm thinking this is easy....but for me not so much
If I have a folder of .txt files that have the same variables and delimiters, I would like to append them all into one SAS dataset. Is there a way to pattern match (*.txt) and use proc append after reading in?
E.g.,
Text files:
1.txt
2.txt
3.txt...
n.txt
read all *.txt in and then append them all.
Cheers!
I have this nearly figured out. I can get all the data in, the only issue is that every time a new txt file is read in (even the first one), a blank line is inserted. Here is the code thus far; I am not including the format coding for the variables nor all the variables as it would take too much space:
* Use wildcard in input; infile "H:\2007\*.txt" FIRSTOBS=1 eov=eov missover filename=filename ENCODING="WLATIN1" DLM='09'x TRUNCOVER DSD; * Input first record and hold line; input@; * Check if this is the first record or the first record in a new file; * If it is, replace the filename with the new file name and move to next line; if _n_ eq 1 or eov then do; txt_file_name = scan(filename, -1, "\"); eov=0; end; else input FlightId : $CHAR25. AirlineCode : $CHAR3. FlightNumber : $CHAR7. SchedDepApt : $CHAR4. SchedArrApt : $CHAR4. many other variables ; run;
If I can get rid of the blank line that starts each input of the txt files that would get it all!
thx
Do your files have a header row? Where is the ‘empty line’ coming from?
The first line is a header line with the variable names in it. I had thought FIRSTOBS=2 would fix it, but it did not, hence I chose FIRSTOBS=1 to test to see if it made a difference. It does. FIRSTOBS=2 will cause it to read the second valid record, which is not good. Hence I am back to FIRSTOBS=1.
The first row in every text file is the header row with the name of the variables. It should be ignored. If I use SAS EG to read it in, it uses FIRSTOBS=2 and this ignores the header row properly. EG uses:
INFILE 'H:\2007\NavCanReport_20171002.txt' LRECL=32767 FIRSTOBS=2 ENCODING="WLATIN1" DLM='09'x MISSOVER DSD ; INPUT FlightId : $CHAR25. lots of vars
This works fine. Obviously I would not know LRECL when grabbing a bunch of text files. Seems like there is something wrong with the code I am using here as it seems to always add a blank line upon reading each file. There are no blank lines in the txt files. Likely my:
infile "H:\2007\*.txt" FIRSTOBS=2 eov=eov missover filename=filename ENCODING="WLATIN1" DLM='09'x TRUNCOVER DSD; * Input first record and hold line; input@; * Check if this is the first record or the first record in a new file; * If it is, replace the filename with the new file name and move to next line; if _n_ eq 1 or eov then do; txt_file_name = scan(filename, -1, "\"); eov=0; end; else input
needs some fixing, but this is just code from other posts here. Perhaps the "move to next line" stuff above? That implies to keep the first row, which I do not want to do as it is a header of variable names.
No, the code is designed to account for a header row. Follow it exactly. Do you have the two INPUT statements?
You have both MISSOVER and TRUNCOVER, pick one.
Remove the FIRSTOBS and other options and see if that works.
The other options I am using are needed as SAS will crash with error codes. They are options that EG adds when manually adding the txt files (we could have hundreds hence the need for some code to bring this in no matter the number). I also have two input statements just as listed before. The blank initial line is still there for some reason. All the data flows properly after that and for each file. Just a blank line is inserted before each new file is read.
What exactly do you mean...I have:
data all_data;
length
specify length of all vars
format
specify format of all vars
informat
commands
*make sure variables to store file name are long enough;
length filename txt_file_name $256;
*keep file name from record to record;
retain txt_file_name;
* Use wildcard in input;
infile "H:\2007\*.txt" eov=eov filename=filename ENCODING="WLATIN1" DLM='09'x MISSOVER DSD;
* Input first record and hold line;
input@;
* Check if this is the first record or the first record in a new file;
* If it is, replace the filename with the new file name and move to next line;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "\");
eov=0;
end;
else input
FlightId : $CHAR25.
AirlineCode : $CHAR3.
FlightNumber : $CHAR7.
SchedDepApt : $CHAR4.
many vars to the end;
How would I implement your idea?
So...an update. None of the programming listed prior worked for me...always problems with the headers. What did work perfectly is:
infile "\\NCRFP4\TAF\Traffic Forecasting\OAG\Cancellation Data\NavCan\*.txt" eov=eov ENCODING="WLATIN1" DLM='09'x MISSOVER DSD firstobs=2; input @; if eov then input; input var1 var 2, for many variables ; eov=0; run;
Not sure why the others did not.
Your problem is including the resetting of the EOV inside of the THEN clause. SAS sets it to 1 when it starts a new file, but it never sets it back to 0.
Note that the code that PROC IMPORT and EG Import function generates is not really that good. Your basic structure of first defining your variables using LENGTH statement is good, but :
Do not both attach permanent INFORMATs with an INFORMAT statement and also specify an informat in the INPUT statement. Unless you really really want to read this file using a different informat than you want attached to the variable.
SAS does NOT need to have either INFORMATs or FORMATS defined for most variables. SAS already knows how to read and write most numbers and character strings. Main exception are date/time/datetime values. You might also need to use COMMA informat if the text file is a report that includes commas and dollar signs in the values of numbers.
The main advantage of using the $CHAR informat (and format) is to preserve leading spaces in the values. Do really WANT to preserve leading spaces in the values of your character variables? If so I am not sure that you can if you are using the DSD option to read.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.