Hello, In order to detect problems with csv input files I count the number of columns in them (and compare them with the expected number), using the code:
data _null;
infile "&dir.test.csv";
input; call symputx("ncol_file1", countw(_infile_,","));
stop; run;
This works fine when all columns have data in row 1 (the data set is without column names). When a column does not have data in row 1 the resulting number of columns is without that column. Is there a way to count a column even when the first row (or several first rows) are without data?
Thank you!
Hello @Taliah,
Use the "m" modifier of the COUNTW function to include blank or zero-length words in the count:
countw(_infile_, ",", "m")
To avoid a result of 1 in case of a blank line (no delimiters) you could subtract 1 in that case:
call symputx("ncol_file1", countw(_infile_, ",", "m")-(_infile_=" "));
Edit: If any of the values may contain the delimiter (comma) in quotation marks (as in "Cary, NC"), include the "q" modifier as well:
countw(_infile_, ",", "mq")
Hello @Taliah,
Use the "m" modifier of the COUNTW function to include blank or zero-length words in the count:
countw(_infile_, ",", "m")
To avoid a result of 1 in case of a blank line (no delimiters) you could subtract 1 in that case:
call symputx("ncol_file1", countw(_infile_, ",", "m")-(_infile_=" "));
Edit: If any of the values may contain the delimiter (comma) in quotation marks (as in "Cary, NC"), include the "q" modifier as well:
countw(_infile_, ",", "mq")
Thank you! the m modifier works great.
You wrote " To avoid a result of 1 in case of a blank line (no delimiters)" -
you mean if row 1 has no modofiers, and no data?
When I add the addition you gave and the file has modifiers and data in row 1, it subtracts 1 from the count (as you wrote), so I can't add that to the code (I am trying to understand what you are suggesting), Thank you!
@Taliah wrote:
You wrote " To avoid a result of 1 in case of a blank line (no delimiters)" -
you mean if row 1 has no modofiers, and no data?
I mean no delimiters and no data except blanks.
When I add the addition you gave and the file has modifiers and data in row 1, it subtracts 1 from the count (as you wrote), ...
No, it doesn't. It subtracts 1 "in that case," i.e., in case of a blank line. The Boolean value (_infile_=" "), which is subtracted, equals 1 only if the equality _infile_=" " is true (this is the case of a blank line, regardless of the number of blanks it contains, zero or more) and 0 otherwise. That is, in almost all cases nothing is subtracted.
I tried it again and it worked (not sure what happened the first time I tried it), Thank you!
Would be interesting to see the code following the data step.
The next step is comparing the number of columns counted with teh expected number, and if they are different the code sends a warning email and stops running (abort cancel)
That code is just trying to find the number of fields on the FIRST line of the file. That could be done more simply with:
data _null;
call symputx('ncol',ncol);
infile "&dir.test.csv" obs=1;
ncol + countw(_infile_,',','mq');
run;
Which will return 0 for a file with no lines.
I thought you wanted to check if the file consistently had the same number of fields on every line. To do that you need to read the whole file.
data _null;
if eof then do;
call symputx('nobs',_n_-1);
call symputx('ncol',ncol);
call symputx('nbad',nbad);
end;
infile "&dir.test.csv" end=eof;
if _n_=1 then ncol = countw(_infile_,',','mq');
else nbad + (ncol ne countw(_infile_,',','mq'));
retain ncol nbad 0;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.