05-23-2017 04:05 PM
There are 50k files. I would like to check the number of columns in each file . The raw files are text files. I would like to see the the file name, number of columns in a temp dataset before actually reading them into SAS datasets. The delimiter is tab separated.
Any help is appreciated!
05-23-2017 04:16 PM
1. Either process the first line of each file - read it in using filevar and obs=1 and then use COUNTW to count the number of the delimiter
2. Use Powershell/Unix command lines to get this information.
#2 will be more efficient, faster and is most likely the right tool for the job.
SAS can do it, but it's a hammer vs thumbtack.
05-23-2017 04:15 PM
So what will you actually do with the approximately 625 pages of results?
Are all of the text files in the same folder? Mulitple folders? Is there any regular pattern to the names of the files?
05-24-2017 05:53 AM - edited 05-24-2017 05:57 AM
If you can use PIPE from your SAS session (some server installations have this disabled) then use that to get the list of files and then use FILEVAR option to open each one. Here is example using Linux syntax. For DOS/Windows systems change 'ls -d' to 'dir /b'.
%let path=/myfolder; data want ; length filename $100 header_cnt 8 filevar $255 ; infile "cd &path ; ls -d *.txt" pipe truncover ; input filename ; filevar=catx('/',"&path",filename); infile header filevar=filevar end=eof; if eof then header_cnt=0; else do; input ; header_cnt=countw(_infile_,'09'x,'mq'); end; run;
Otherwise you could use a wildcard in a regular INFILE statement with the FILENAME option, but then you will need to read all of the lines of all of the lines. Should work for either Linux or DOS/Windows systems.
%let path=/myfolder; data want ; length filename $100 header_cnt 8 filevar $255 ; infile "&path/*.txt" eov=eov filename=filevar truncover ; input ; if _n_=1 or eov then do; filename=scan(filevar,-1,'/\'); header_cnt=countw(_infile_,'09'x,'mq'); output; end; eov=0; run;
05-24-2017 06:57 AM
@Tom: I would probably use the COUNT function rather than the COUNTW function - when parsing .csv files, two consecutive delimiters means that there is a column, but it is empty in that row. Using COUNTW could give you the number of non-empty columns in the row being parsed, not the total number of columns in the file.
05-24-2017 09:44 AM
The M and Q modifiers on the COUNTW() function handle the missing values. Note that the COUNT() function cannot handle cases where the value can contain the delimiter.
05-24-2017 10:39 AM
this code if it is run once,it is working. if I had to run the second time by adding more files to the directory, I'm getting an error.
" ERROR: File is in use,"
data want ;
length filename $100 header_cnt 8 filevar $255 ;
infile "&path/*.txt" eov=eov filename=filevar truncover ;
if _n_=1 or eov then do;
05-24-2017 11:10 AM
If it worked once but now you are getting 'file in use' error messages then most likely the issue is that you have one or more of the individual files open with some other application.
05-23-2017 04:22 PM
1. Get a list of files - see SAS 9.4 macro appendix on how that can be done automatically or search on here for many various solutions. Pretty sure I posted one last week.
2. Use a datastep to read a single line from a single file and count the number of delimiters (infile, COUNTW). Make sure it works for one file CORRECTLY before moving forward.
3. Look at FILEVAR option that allows you read multiple files based on paths, which you have from step 1
4. Combine 3/4 so that it will process all your files
05-23-2017 04:43 PM
First create a data set that holds the list of files to check, there are different approches to that, like FILENAME statement with the PIPE access method.
Once you have the list you can use a second DATA Step to read the first line of each file, Have a look at the FILEVAR= option of the INFILE statement. Using just and empty INPUT; statement read the first line. Then you can use the COUNTW function to count the number of "words" delimited by "09"x (hexadecimal for TAB char).
05-24-2017 05:11 AM
@SASPhile: I would try something like this:
%let dir=/usr/local/tmp; /* full path of directory with files */ data members; did=dopen('indir'); do _n_=1 by 1; fname=dread(did,_N_); if fname=' ' then leave;
/* you can also put criteria here, instead of in the next datastep,
only LIKE is very nifty and only available with WHERE */ output; end; keep fname; run; data hcount; set members; where upcase(fname) like '%.TXT'; /* or whatever criteria you want */ path=cats("&dir",'/',fname); /* may have to change / to \ for Windows */ infile dummy filevar=path; input; count=count(_infile_,'09'x); keep fname count; run;