BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Taliah
Quartz | Level 8

Hello, In order to detect problems with csv input files I count the number of columns in them (and compare them with the expected number), using the code:

data _null;

infile "&dir.test.csv";

input; call symputx("ncol_file1", countw(_infile_,","));

stop; run;

This works fine when all columns have data in row 1 (the data set is without column names). When a column does not have data in row 1 the resulting number of columns is without that column. Is there a way to count a column even when the first row (or several first rows) are without data?

Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Taliah,

 

Use the "m" modifier of the COUNTW function to include blank or zero-length words in the count:

countw(_infile_, ",", "m")

 To avoid a result of 1 in case of a blank line (no delimiters) you could subtract 1 in that case:

call symputx("ncol_file1", countw(_infile_, ",", "m")-(_infile_=" "));

 

Edit: If any of the values may contain the delimiter (comma) in quotation marks (as in "Cary, NC"), include the "q" modifier as well:

countw(_infile_, ",", "mq")

 

View solution in original post

7 REPLIES 7
FreelanceReinh
Jade | Level 19

Hello @Taliah,

 

Use the "m" modifier of the COUNTW function to include blank or zero-length words in the count:

countw(_infile_, ",", "m")

 To avoid a result of 1 in case of a blank line (no delimiters) you could subtract 1 in that case:

call symputx("ncol_file1", countw(_infile_, ",", "m")-(_infile_=" "));

 

Edit: If any of the values may contain the delimiter (comma) in quotation marks (as in "Cary, NC"), include the "q" modifier as well:

countw(_infile_, ",", "mq")

 

Taliah
Quartz | Level 8

Thank you! the m modifier works great.

You wrote " To avoid a result of 1 in case of a blank line (no delimiters)" - 

you mean if row 1 has no modofiers, and no data?

When I add the addition you gave and the file has modifiers and data in row 1,  it subtracts 1 from the count (as you wrote), so I can't add that to the code (I am trying to understand what you are suggesting), Thank you!

FreelanceReinh
Jade | Level 19

@Taliah wrote:

You wrote " To avoid a result of 1 in case of a blank line (no delimiters)" - 

you mean if row 1 has no modofiers, and no data?


I mean no delimiters and no data except blanks.

 


When I add the addition you gave and the file has modifiers and data in row 1,  it subtracts 1 from the count (as you wrote), ...

No, it doesn't. It subtracts 1 "in that case," i.e., in case of a blank line. The Boolean value (_infile_=" "), which is subtracted, equals 1 only if the equality _infile_=" " is true  (this is the case of a blank line, regardless of the number of blanks it contains, zero or more) and 0 otherwise. That is, in almost all cases nothing is subtracted.

Taliah
Quartz | Level 8

I tried it again and it worked (not sure what happened the first time I tried it), Thank you!

andreas_lds
Jade | Level 19

Would be interesting to see the code following the data step.

 

Taliah
Quartz | Level 8

The next step is comparing the number of columns counted with teh expected number, and if they are different the code sends a warning email and stops running (abort cancel)

Tom
Super User Tom
Super User

That code is just trying to find the number of fields on the FIRST line of the file.  That could be done more simply with:

data _null;
  call symputx('ncol',ncol);
  infile "&dir.test.csv" obs=1;
  ncol + countw(_infile_,',','mq');
run;

Which will return 0 for a file with no lines.

 

I thought you wanted to check if the file consistently had the same number of fields on every line.  To do that you need to read the whole file.

data _null;
  if eof then do;
    call symputx('nobs',_n_-1);
    call symputx('ncol',ncol);
    call symputx('nbad',nbad);
  end;
  infile "&dir.test.csv" end=eof;
  if _n_=1 then ncol = countw(_infile_,',','mq');
  else nbad + (ncol ne countw(_infile_,',','mq'));
  retain ncol nbad 0;
run;

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1564 views
  • 0 likes
  • 4 in conversation