- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello, In order to detect problems with csv input files I count the number of columns in them (and compare them with the expected number), using the code:
data _null;
infile "&dir.test.csv";
input; call symputx("ncol_file1", countw(_infile_,","));
stop; run;
This works fine when all columns have data in row 1 (the data set is without column names). When a column does not have data in row 1 the resulting number of columns is without that column. Is there a way to count a column even when the first row (or several first rows) are without data?
Thank you!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Taliah,
Use the "m" modifier of the COUNTW function to include blank or zero-length words in the count:
countw(_infile_, ",", "m")
To avoid a result of 1 in case of a blank line (no delimiters) you could subtract 1 in that case:
call symputx("ncol_file1", countw(_infile_, ",", "m")-(_infile_=" "));
Edit: If any of the values may contain the delimiter (comma) in quotation marks (as in "Cary, NC"), include the "q" modifier as well:
countw(_infile_, ",", "mq")
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Taliah,
Use the "m" modifier of the COUNTW function to include blank or zero-length words in the count:
countw(_infile_, ",", "m")
To avoid a result of 1 in case of a blank line (no delimiters) you could subtract 1 in that case:
call symputx("ncol_file1", countw(_infile_, ",", "m")-(_infile_=" "));
Edit: If any of the values may contain the delimiter (comma) in quotation marks (as in "Cary, NC"), include the "q" modifier as well:
countw(_infile_, ",", "mq")
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you! the m modifier works great.
You wrote " To avoid a result of 1 in case of a blank line (no delimiters)" -
you mean if row 1 has no modofiers, and no data?
When I add the addition you gave and the file has modifiers and data in row 1, it subtracts 1 from the count (as you wrote), so I can't add that to the code (I am trying to understand what you are suggesting), Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Taliah wrote:
You wrote " To avoid a result of 1 in case of a blank line (no delimiters)" -
you mean if row 1 has no modofiers, and no data?
I mean no delimiters and no data except blanks.
When I add the addition you gave and the file has modifiers and data in row 1, it subtracts 1 from the count (as you wrote), ...
No, it doesn't. It subtracts 1 "in that case," i.e., in case of a blank line. The Boolean value (_infile_=" "), which is subtracted, equals 1 only if the equality _infile_=" " is true (this is the case of a blank line, regardless of the number of blanks it contains, zero or more) and 0 otherwise. That is, in almost all cases nothing is subtracted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I tried it again and it worked (not sure what happened the first time I tried it), Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Would be interesting to see the code following the data step.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The next step is comparing the number of columns counted with teh expected number, and if they are different the code sends a warning email and stops running (abort cancel)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That code is just trying to find the number of fields on the FIRST line of the file. That could be done more simply with:
data _null;
call symputx('ncol',ncol);
infile "&dir.test.csv" obs=1;
ncol + countw(_infile_,',','mq');
run;
Which will return 0 for a file with no lines.
I thought you wanted to check if the file consistently had the same number of fields on every line. To do that you need to read the whole file.
data _null;
if eof then do;
call symputx('nobs',_n_-1);
call symputx('ncol',ncol);
call symputx('nbad',nbad);
end;
infile "&dir.test.csv" end=eof;
if _n_=1 then ncol = countw(_infile_,',','mq');
else nbad + (ncol ne countw(_infile_,',','mq'));
retain ncol nbad 0;
run;