09-14-2013 07:30 AM
While reading a very large data (Millions of record in data file with both character and numeric variables) using input statement, what is the best way to validate the total data is completely read or something is missing?
Thanks in advance for your suggestions.
09-14-2013 11:07 AM
Read the notes in the log.
NOTE: 19 records were read from the infile TMPFILE1.
The minimum record length was 17.
The maximum record length was 21.
NOTE: The data set WORK.WANT has 19 observations and 5 variables.
09-14-2013 03:52 PM
The MEANS procedure could give you counts for missing cells in all columns with brief code like :
proc means noprint;
var _numeric_ ;
%let missDS= &SYSLAST ;
I assumed you would execute this code straight after the input step so the procedure default input dataset name rule applies (most recently created dataset).
With no OUT= defined on the OUTPUT statement tge procedure creates new table in the DATAn sequence which my syntax collects from &SYSLAST.
This table has one row with a counter variable named like the numeric variables in your input dataset. These count the missing cells. I guess you hope for a set of zeroes.