BookmarkSubscribeRSS Feed
SASPhile
Quartz | Level 8

There are 50k files. I would like to check the number of columns in each file . The raw files are text files. I would like to see the the file name, number of columns in a temp dataset  before actually reading them into SAS datasets. The delimiter is tab separated.

Any help is appreciated!

15 REPLIES 15
Reeza
Super User

Ok...so what's your question?

SASPhile
Quartz | Level 8

How to get the header count

Reeza
Super User

1. Either process the first line of each file - read it in using filevar and obs=1 and then use COUNTW to count the number of the delimiter

2. Use Powershell/Unix command lines to get this information.

 

#2 will be more efficient, faster and is most likely the right tool for the job. 

 

SAS can do it, but it's a hammer vs thumbtack. 

ballardw
Super User

So what will you actually do with the approximately 625 pages of results?

 

Are all of the text files in the same folder? Mulitple folders? Is there any regular pattern to the names of the files?

SASPhile
Quartz | Level 8

Same folder.

abc.txt

pqr.txt

xyz.txt

 

File_name            header_cnt

abc.txt                   25

pqr.txt                    26

xyz.txt                    30

Tom
Super User Tom
Super User

If you can use PIPE from your SAS session (some server installations have this disabled) then use that to get the list of files and then use FILEVAR option to open each one.  Here is example using Linux syntax. For DOS/Windows systems change 'ls -d' to 'dir /b'.

%let path=/myfolder;
data want ;
   length filename $100 header_cnt 8 filevar $255 ;
   infile "cd &path ; ls -d *.txt" pipe truncover ;
   input filename ;
   filevar=catx('/',"&path",filename);
   infile header filevar=filevar end=eof;
   if eof then header_cnt=0;
  else do;
    input ;
    header_cnt=countw(_infile_,'09'x,'mq');
  end;
run;

Otherwise you could use a wildcard in a regular INFILE statement with the FILENAME option, but then you will need to read all of the lines of all of the lines. Should work for either Linux or DOS/Windows systems.

%let path=/myfolder;
data want ;
   length filename $100 header_cnt 8 filevar $255 ;
   infile "&path/*.txt" eov=eov filename=filevar truncover ;
   input ;
   if _n_=1 or eov then do;
     filename=scan(filevar,-1,'/\');
     header_cnt=countw(_infile_,'09'x,'mq');
     output;
  end;
  eov=0;
run; 

 

s_lassen
Meteorite | Level 14

@Tom: I would probably use the COUNT function rather than the COUNTW function - when parsing .csv files, two consecutive delimiters means that there is a column, but it is empty in that row. Using COUNTW could give you the number of non-empty columns in the row being parsed, not the total number of columns in the file.

Tom
Super User Tom
Super User

The M and Q modifiers on the COUNTW() function handle the missing values. Note that the COUNT() function cannot handle cases where the value can contain the delimiter.

SASPhile
Quartz | Level 8

this code if it is run once,it is working. if I had to run the second time by adding more files to the directory, I'm getting an error.

 " ERROR: File is in use,"

 

%let path=/myfolder;
data want ;
   length filename $100 header_cnt 8 filevar $255 ;
   infile "&path/*.txt" eov=eov filename=filevar truncover ;
   input ;
   if _n_=1 or eov then do;
     filename=scan(filevar,-1,'/\');
     header_cnt=countw(_infile_,'09'x,'mq');
     output;
  end;
  eov=0;
run; 

Tom
Super User Tom
Super User

If it worked once but now you are getting 'file in use' error messages then most likely the issue is that you have one or more of the individual files open with some other application.

Reeza
Super User

1. Get a list of files - see SAS 9.4 macro appendix on how that can be done automatically or search on here for many various solutions. Pretty sure I posted one last week.

2. Use a datastep to read a single line from a single file and count the number of delimiters (infile, COUNTW). Make sure it works for one file CORRECTLY before moving forward.

3. Look at FILEVAR option that allows you read multiple files based on paths, which you have from step 1

4. Combine 3/4 so that it will process all your files 

 

 

BrunoMueller
SAS Super FREQ

Hi

 

First create a data set that holds the list of files to check, there are different approches to that, like FILENAME statement with the PIPE access method.

 

Once you have the list you can use a second DATA Step to read the first line of each file, Have a look at the FILEVAR= option of the INFILE statement. Using just and empty INPUT; statement read the first line. Then you can use the COUNTW function to count the number of "words" delimited by "09"x (hexadecimal for TAB char).

 

Bruno

s_lassen
Meteorite | Level 14

@SASPhile: I would try something like this:

%let dir=/usr/local/tmp; /* full path of directory with files */
data members;
  did=dopen('indir');
  do _n_=1 by 1;
    fname=dread(did,_N_);
    if fname=' ' then leave;
/* you can also put criteria here, instead of in the next datastep,
only LIKE is very nifty and only available with WHERE */ output; end; keep fname; run; data hcount; set members; where upcase(fname) like '%.TXT'; /* or whatever criteria you want */ path=cats("&dir",'/',fname); /* may have to change / to \ for Windows */ infile dummy filevar=path; input; count=count(_infile_,'09'x); keep fname count; run;
SASPhile
Quartz | Level 8

I believe in this statement did=dopen('indir'), it should be 'dir' in place of 'indir'?

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1512 views
  • 1 like
  • 6 in conversation