Broke the Internet looking for the answer to this one... or at least my patience.
Using infile to import couple thousand text files.
Data Want;
infile "data/tmz/abcd/ef/ghij/klmnop/qrst/pvwxyz/SURR*.txt" DSD DELIMITER='|' eov=eov ;
input var1 $ var2 $ var3 $ var4 $;
run;
Works fine.
However,<some> text files are empty, just blank. Zero bytes. There is a unique name (naturally)
for each text file, embedded in that name is data I can use, even if the text file contains no data.
I am just a humble SAS programmer - if that. I can't do anything about these files with
no data, but I do have to report that XXX numbers of files were submitted and my program
processed XXX number of files - and those numbers better match, whether there is data in
those files or not. Many, many, many posts on stackexchange, stackoverflow, various University sites,
to SKIP over empty csv/txt files, but NOT to process them (SAS, by default does not process them).
Below is a portion of the log processing 3,000 or so files where an empty text file was found:
NOTE: The infile "data/tmz/abcd/ef/ghij/klmnop/qrst/pvwxyz/SURR*.txt" is:
Filename=/data/tmz/abcd/ef/ghij/klmnop/qrst/pvwxyz/SURR_XXX_xxx.txt, <Need the file name,only
File List=/data/tmz/abcd/ef/ghij/klmnop/qrst/pvwxyz/SURR_XXX_xxx.txt,
Owner Name=WhoseUrDaddy,Group Name=AA,
Access Permission=-rw-rw-r--,
Last Modified=28Aug2019:09:38:56,
File Size (bytes)=0
This is the Linux version of SAS 9.4, but the same result in Windows 10 9.4.
All I want is to check if the file has data or not, if not, append the filename to the dataset.
If i add the filename to the existing infile -
length filename $77.; for example,
SAS will add the filename of those txt files with records
but, again, skip over the empty files.
I am looking for code to:
A) check if the txt file has data - _n_ = 0, if the file has data, run infile var1/var2/var3,etc.
B) If the text file has a file size of zero bytes - extract the filename only - something like below:
rc=filename("FILE","data/tmz/abcd/ef/ghij/klmnop/qrst/pvwxyz/SURR*.txt");
fid=fopen("FILE");
infonum=foptnum(fid);
do i=1 to infonum;
infoname=foptname(fid,i);
infoval=finfo(fid,infoname);
output;
end;
close=fclose(fid);
I just don't know how to incorporate both these snipits of code into one SAS program.
We're talking about 3,200 files in this process.
There are dozens of examples of getting file listings into a data set on the forum such as Filename PIPE with a directory listing command.
Perhaps that might get you the information you need though not a data set with 0 observations. Scrub that against the data sets filename as added.
Or with the same directory listing approach capture the file size and select those with 0 bytes (or appropriate reported size).
If you change the way you read the files to use INFILE option FILEVAR you can create a list of files as you read them, with an indicator of whether or not they have records.
filename FT15F001 'z1.txt';
parmcards;
a
b
;;;;
filename FT15F001 'z2.txt';
parmcards;
;;;;
filename FT15F001 'z3.txt';
parmcards;
c
d
;;;;
data driver;
cmd = 'dir /b z*.txt';
infile dummy pipe filevar=cmd end=eof;
do while(not eof);
input filename &$128.;
put _infile_;
output;
end;
stop;
run;
proc print;
run;
data z(keep=x) files(keep=filename zerorecs);
set driver;
filevar=filename;
length fname $128;
infile dummy filevar=filevar end=eof filename=fname;
putlog fname= eof=;
filename=fname;
zerorecs=eof;
output files;
do while(not eof);
input x :$1.;
output z;
end;
run;
proc print data=files;
run;
I had tried the filename filelist pipe 'dir /b /s home/my/mother/the/car/SURR*.TXT' previously
but it doesn't work on SAS 9.4 running on Linux.
Result is below.
What about:
DATA ASCIIFILES;
LENGTH FILENAME $55.;
rc=filename("FILENAME","\home\my\mother\the\car\SURR*.TXT");
did=dopen("FILENAME");
if did > 0 then do; <--if did = 0 insert filename only, don't skip over
num = dnum(did);
do i = 1 to num;
FILENAME = dread(did,i);
EXT= substr(FILENAME,length(FILENAME)-2,3); \*identify txt files only -
other departments share this directory and put their junk in there /*
OUTPUT;
end;
RC=dclose(did);
end;
run;
results of Filename filelist pipe 'dir /b /s /xxxx/xxxx/xxxx/SURR*.txt
DIR is NOT a UNIX command use the proper UNIX command LS.
How did you get that output using a command that doesn't exist? Perhaps someone made an alias for you to convert dir into ls?
Try it this way instead, although are you sure that all of the files are using capital SURR at the beginning a capital TXT as the extension? Unix filenames are case sensitive.
infile 'ls -d /home/my/mother/the/car/SURR*.TXT' pipe truncover;
input filename $256.;
And this cannot work
rc=filename("FILENAME","\home\my\mother\the\car\SURR*.TXT");
did=dopen("FILENAME");
Since on UNIX a \ just protects the next character. So you asked it to open a file named:
homemymotherthecarSURR*.TXT
as if it was a directory.
Did you try this instead?
rc=filename("FILENAME","/home/my/mother/the/car/");
did=dopen("FILENAME");
Now that worked on the Linux SAS, replacing "dir" with "ls -d".
Thanx.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: