BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,


I know how to check the number of records on a CSV file once it has been read in to SAS, but am wondering if there is a shortcut where I could get at the # of records on the file without having to read it in first.

 

Thanks,
Jenna

5 REPLIES 5
ballardw
Super User

Do you know how many rows of header information may be involved?

Does the end of the CSV file have "empty" rows  consisting of nothing but commas ( a frequent behavior when converting spreadsheets to CSV)? 

Answers to these go to the question of "valid" observations versus lines of text in the file.

 

Can you tell us what you will do differently when you have that information?

mkeintz
PROC Star

SAS treats csv files essentially as a stream of data, with each obs separated by a CR or CRLF, and typically terminated by an EOF (end of file indicator).   But unlike a SAS dataset, a CSV file does not require inclusion of such metadata as the number of obs (rows).  So there is no way for you to know with certainty how many records are in the CSV file without processing it.  

 

There can be exceptions of course.  If you know that all records have the same length - and you know (or can determine) what that length is, you can tell SAS to ask the operating system for the size of the csv file in bytes, then divide by the fixed length, in bytes, of each record to get the record count.  This is an increasingly unlikely scenario. 

 

Of course, if you can even make a good guess at the average record length, you can generate an equally good guess at the number of records using the CSV file size.  As an example:

 

%let csv_filename=c:\temp\export.csv;
%let expected_record_length=40;

data info;
   drop rc fid close;
   rc=filename('abc', "&csv_filename");

   fid=fopen('abc');
   filesize=input(finfo(fid,"File Size (bytes)"),best32.);
   close=fclose(fid);   
   expected_nrecs=filesize/&expected_record_length;
   put filesize= expected_nrecs=;
run;

 

The argument "File Size (bytes)" is apparently valid in both Windows and Unix - I successfully tested it in Windows.

 

And, as @ballardw notes, you can improve the estimate of record counts it you know how many header records are in the CSV file.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

No.

A CSV file is text file with variable length lines.  The only way to know how many lines are in the file is to read the whole file.

 

But you don't have to actually do anything with the lines, other than count them.

data _null_;
  if eof then call symputx('num_lines', _n_-1);
  infile 'myfile.csv' end=eof;
  input;
run;
%put Number of lines in myfile.csv is &num_lines..;

And you don't have to use SAS.  For example if you are running on Unix just use the wc command with the -l option to count the number of lines in the file.  If the file has a header row then the number of lines of actual data is one less than the number of lines.

AhmedAl_Attar
Rhodochrosite | Level 12

Hi @Walternate 

If your SAS setup allows accessing the OS command, then you can use this macro on either Windows/Linux

%macro util_getFileLineCount(p_textFile=, p_rtrnMacVarName=, p_funkyString=\n);
	%LOCAL l_os l_firstObs;

	%let l_os = %substr(&sysscp,1,3);
	%let l_firstObs=1;

	%if (&l_os EQ LIN) %then
	%do;
		FILENAME filesize pipe "wc -l &p_textFile";
	%end;
	%else /*(&l_os EQ WIN)*/
	%do;
		FILENAME filesize pipe "find /V /C ""&p_funkyString"" &p_textFile";
		%let l_firstObs=2;
	%end;

	DATA _NULL_;
		INFILE filesize FIRSTOBS=&l_firstObs END=eof;
		INPUT;
	%if (&l_os EQ LIN) %then
	%do;
		CALL SYMPUTX("&p_rtrnMacVarName",SCAN(_INFILE_,1,' '));
	%end;
	%else
	%do;
		CALL SYMPUTX("&p_rtrnMacVarName",SCAN(_INFILE_,-1,':'));
	%end;
	RUN;
%mend util_getFileLineCount;

%global g_lineCount;
%util_getFileLineCount(p_textFile=%str(<path/filename.type>), p_rtrnMacVarName=g_lineCount, p_funkyString=\n);​/* Change \n to a different value that shouldn't be in the file */
%put &=g_lineCount;

Note: You can save the util_getFileLineCount macro into it's own file util_getfilelinecount.sas for re-use.

 

Hope this helps

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 3347 views
  • 1 like
  • 6 in conversation