06-22-2016 09:49 AM - edited 06-22-2016 09:50 AM
I want use data step to read a very large raw data with infile statement.
firstly I want to know how many rows in the file,if I only want to read the last records, I can use firstobs=n obs=n option but I need to know the value of n first.
INFILE raw1 FIRSTOBS=9999999 OBS=9999999.;
INPUT a $50.;
06-22-2016 09:59 AM
I don't think so, reading in a text file is a linear process, its starts at character 1 and runs to the end of the file. Why can you not proces the data once its read in, even really big files shouldn't take that long? Why would you want only a few observations from the end?
06-22-2016 10:30 AM
If you are working under Linux/Unix this can be done. But the exact statements are beyond my Unix knowledge. Here's the idea.
Unix contains a "tail" command that lists end of a file. Instead of listing it, the results can be piped to a file.
Now combine all of this with an INFILE statement. The INFILE statements contains the "tail" command, piping its results as part of the INFILE statement definition (rather than to a file). The combination lets the INFILE statement retrieve the tail end of the data source.
If you indicate that this would be useful for you, I'm sure someone on the board can give you more specific code.
06-22-2016 10:57 PM
DATA _null_; INFILE '/folders/myfolders/all_jd.csv' end=last; INPUT; n+1; if last then putlog 'NOTE: File have ' n ' rows.'/ 'The last row is:' _infile_ ; RUN;
07-08-2016 09:23 AM