DATA Step, Macro, Functions and more

Count delimter (typically gives me header count)

Reply
Super Contributor
Posts: 647

Count delimter (typically gives me header count)

There are 50k files. I would like to check the number of columns in each file . The raw files are text files. I would like to see the the file name, number of columns in a temp dataset  before actually reading them into SAS datasets. The delimiter is tab separated.

Any help is appreciated!

Super User
Posts: 17,829

Re: Count delimter (typically gives me header count)

Ok...so what's your question?

Super Contributor
Posts: 647

Re: Count delimter (typically gives me header count)

How to get the header count

Super User
Posts: 17,829

Re: Count delimter (typically gives me header count)

1. Either process the first line of each file - read it in using filevar and obs=1 and then use COUNTW to count the number of the delimiter

2. Use Powershell/Unix command lines to get this information.

 

#2 will be more efficient, faster and is most likely the right tool for the job. 

 

SAS can do it, but it's a hammer vs thumbtack. 

Super User
Posts: 10,500

Re: Count delimter (typically gives me header count)

So what will you actually do with the approximately 625 pages of results?

 

Are all of the text files in the same folder? Mulitple folders? Is there any regular pattern to the names of the files?

Super Contributor
Posts: 647

Re: Count delimter (typically gives me header count)

Same folder.

abc.txt

pqr.txt

xyz.txt

 

File_name            header_cnt

abc.txt                   25

pqr.txt                    26

xyz.txt                    30

Super User
Super User
Posts: 6,500

Re: Count delimter (typically gives me header count)

[ Edited ]

If you can use PIPE from your SAS session (some server installations have this disabled) then use that to get the list of files and then use FILEVAR option to open each one.  Here is example using Linux syntax. For DOS/Windows systems change 'ls -d' to 'dir /b'.

%let path=/myfolder;
data want ;
   length filename $100 header_cnt 8 filevar $255 ;
   infile "cd &path ; ls -d *.txt" pipe truncover ;
   input filename ;
   filevar=catx('/',"&path",filename);
   infile header filevar=filevar end=eof;
   if eof then header_cnt=0;
  else do;
    input ;
    header_cnt=countw(_infile_,'09'x,'mq');
  end;
run;

Otherwise you could use a wildcard in a regular INFILE statement with the FILENAME option, but then you will need to read all of the lines of all of the lines. Should work for either Linux or DOS/Windows systems.

%let path=/myfolder;
data want ;
   length filename $100 header_cnt 8 filevar $255 ;
   infile "&path/*.txt" eov=eov filename=filevar truncover ;
   input ;
   if _n_=1 or eov then do;
     filename=scan(filevar,-1,'/\');
     header_cnt=countw(_infile_,'09'x,'mq');
     output;
  end;
  eov=0;
run; 

 

PROC Star
Posts: 63

Re: Count delimter (typically gives me header count)

@Tom: I would probably use the COUNT function rather than the COUNTW function - when parsing .csv files, two consecutive delimiters means that there is a column, but it is empty in that row. Using COUNTW could give you the number of non-empty columns in the row being parsed, not the total number of columns in the file.

Super User
Super User
Posts: 6,500

Re: Count delimter (typically gives me header count)

The M and Q modifiers on the COUNTW() function handle the missing values. Note that the COUNT() function cannot handle cases where the value can contain the delimiter.

Super Contributor
Posts: 647

Re: Count delimter (typically gives me header count)

this code if it is run once,it is working. if I had to run the second time by adding more files to the directory, I'm getting an error.

 " ERROR: File is in use,"

 

%let path=/myfolder;
data want ;
   length filename $100 header_cnt 8 filevar $255 ;
   infile "&path/*.txt" eov=eov filename=filevar truncover ;
   input ;
   if _n_=1 or eov then do;
     filename=scan(filevar,-1,'/\');
     header_cnt=countw(_infile_,'09'x,'mq');
     output;
  end;
  eov=0;
run; 

Super User
Super User
Posts: 6,500

Re: Count delimter (typically gives me header count)

If it worked once but now you are getting 'file in use' error messages then most likely the issue is that you have one or more of the individual files open with some other application.

Super User
Posts: 17,829

Re: Count delimter (typically gives me header count)

1. Get a list of files - see SAS 9.4 macro appendix on how that can be done automatically or search on here for many various solutions. Pretty sure I posted one last week.

2. Use a datastep to read a single line from a single file and count the number of delimiters (infile, COUNTW). Make sure it works for one file CORRECTLY before moving forward.

3. Look at FILEVAR option that allows you read multiple files based on paths, which you have from step 1

4. Combine 3/4 so that it will process all your files 

 

 

SAS Super FREQ
Posts: 683

Re: Count delimter (typically gives me header count)

Hi

 

First create a data set that holds the list of files to check, there are different approches to that, like FILENAME statement with the PIPE access method.

 

Once you have the list you can use a second DATA Step to read the first line of each file, Have a look at the FILEVAR= option of the INFILE statement. Using just and empty INPUT; statement read the first line. Then you can use the COUNTW function to count the number of "words" delimited by "09"x (hexadecimal for TAB char).

 

Bruno

PROC Star
Posts: 63

Re: Count delimter (typically gives me header count)

@SASPhile: I would try something like this:

%let dir=/usr/local/tmp; /* full path of directory with files */
data members;
  did=dopen('indir');
  do _n_=1 by 1;
    fname=dread(did,_N_);
    if fname=' ' then leave;
/* you can also put criteria here, instead of in the next datastep,
only LIKE is very nifty and only available with WHERE */ output; end; keep fname; run; data hcount; set members; where upcase(fname) like '%.TXT'; /* or whatever criteria you want */ path=cats("&dir",'/',fname); /* may have to change / to \ for Windows */ infile dummy filevar=path; input; count=count(_infile_,'09'x); keep fname count; run;
Super Contributor
Posts: 647

Re: Count delimter (typically gives me header count)

I believe in this statement did=dopen('indir'), it should be 'dir' in place of 'indir'?

Ask a Question
Discussion stats
  • 15 replies
  • 192 views
  • 1 like
  • 6 in conversation