I know how to check the number of records on a CSV file once it has been read in to SAS, but am wondering if there is a shortcut where I could get at the # of records on the file without having to read it in first.
Do you know how many rows of header information may be involved?
Does the end of the CSV file have "empty" rows consisting of nothing but commas ( a frequent behavior when converting spreadsheets to CSV)?
Answers to these go to the question of "valid" observations versus lines of text in the file.
Can you tell us what you will do differently when you have that information?
SAS treats csv files essentially as a stream of data, with each obs separated by a CR or CRLF, and typically terminated by an EOF (end of file indicator). But unlike a SAS dataset, a CSV file does not require inclusion of such metadata as the number of obs (rows). So there is no way for you to know with certainty how many records are in the CSV file without processing it.
There can be exceptions of course. If you know that all records have the same length - and you know (or can determine) what that length is, you can tell SAS to ask the operating system for the size of the csv file in bytes, then divide by the fixed length, in bytes, of each record to get the record count. This is an increasingly unlikely scenario.
Of course, if you can even make a good guess at the average record length, you can generate an equally good guess at the number of records using the CSV file size. As an example:
%let csv_filename=c:\temp\export.csv; %let expected_record_length=40; data info; drop rc fid close; rc=filename('abc', "&csv_filename"); fid=fopen('abc'); filesize=input(finfo(fid,"File Size (bytes)"),best32.); close=fclose(fid); expected_nrecs=filesize/&expected_record_length; put filesize= expected_nrecs=; run;
The argument "File Size (bytes)" is apparently valid in both Windows and Unix - I successfully tested it in Windows.
And, as @ballardw notes, you can improve the estimate of record counts it you know how many header records are in the CSV file.
A CSV file is text file with variable length lines. The only way to know how many lines are in the file is to read the whole file.
But you don't have to actually do anything with the lines, other than count them.
data _null_; if eof then call symputx('num_lines', _n_-1); infile 'myfile.csv' end=eof; input; run; %put Number of lines in myfile.csv is &num_lines..;
And you don't have to use SAS. For example if you are running on Unix just use the wc command with the -l option to count the number of lines in the file. If the file has a header row then the number of lines of actual data is one less than the number of lines.
On a UNIX system, use the external command wc -l. This is the fastest way to do it.
If your SAS setup allows accessing the OS command, then you can use this macro on either Windows/Linux
%macro util_getFileLineCount(p_textFile=, p_rtrnMacVarName=, p_funkyString=\n); %LOCAL l_os l_firstObs; %let l_os = %substr(&sysscp,1,3); %let l_firstObs=1; %if (&l_os EQ LIN) %then %do; FILENAME filesize pipe "wc -l &p_textFile"; %end; %else /*(&l_os EQ WIN)*/ %do; FILENAME filesize pipe "find /V /C ""&p_funkyString"" &p_textFile"; %let l_firstObs=2; %end; DATA _NULL_; INFILE filesize FIRSTOBS=&l_firstObs END=eof; INPUT; %if (&l_os EQ LIN) %then %do; CALL SYMPUTX("&p_rtrnMacVarName",SCAN(_INFILE_,1,' ')); %end; %else %do; CALL SYMPUTX("&p_rtrnMacVarName",SCAN(_INFILE_,-1,':')); %end; RUN; %mend util_getFileLineCount; %global g_lineCount; %util_getFileLineCount(p_textFile=%str(<path/filename.type>), p_rtrnMacVarName=g_lineCount, p_funkyString=\n);/* Change \n to a different value that shouldn't be in the file */ %put &=g_lineCount;
Note: You can save the util_getFileLineCount macro into it's own file util_getfilelinecount.sas for re-use.
Hope this helps
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.