Learning SAS? Welcome to the exclusive online community for all SAS learners.

How to ignore (identify) last line (observation) from each RAW input file in a list of multiple input files?

Reply
New Contributor
Posts: 2

How to ignore (identify) last line (observation) from each RAW input file in a list of multiple input files?

Hi All.

Firstly, a brief intro/fyi: this is my first post here, did the 'SAS Programming 1 Essentials' and now would like to expand a bit on that; so initially my questions will be very basic/fundamental (ie: noob).

Now, to the question setup:

Suppose I have raw data text input files, one per day; e.g.: myrawdata_20150423.dat.

The LAST LINE in EACH file is some control record, it is totally different data structure as all the above data and I do not want it (for now).

When I load one file only I can drop the last observation, which I have implemented as follows:

%let theInputFiles="/some_path/my_raw_data_20150423.dat";

data my_new_sas_data_20150423;

infile &theInputFiles dlm='|' DSD missover end=lastline;

input <all the data fields want etc>;

if lastline then delete;

output my_new_sas_data_20150423;

run;

This all works ok, suppose I have an input file with 10 lines where line 10 is the last line (with the ctrl data), then my new sas data has 9 records; so far all good.

Now the issue/my question proper:

When I try to load all files for a single month, then this drop last line "trick" only appears to work for the first (input) file but not the rest. See this:

%let theInputFiles="/some_path/my_raw_data_201504*.dat";

data my_new_sas_data_201504;

infile &theInputFiles dlm='|' DSD missover end=lastline;

input <all the data fields want etc>;

if lastline then delete;

output my_new_sas_data_201504;

run;

This time again for the first input file I only get the data I want, and the last line (the ctrl record) is dropped. So far so good.

BUT - all the following input files I get the ctrl record (last line!) also loaded into my new sas data.

So if I have total of 3 input files, each with 10 lines (9 data plus line 10 for the ctrl), then my new sas data set should have 27 observations, but I get 29.

I get 9 from first file and all 10 (each) from the remaining two files read in.

Q: Why? What is happening here?

NB: removing the last record from raw input file first (via a unix shell script say) is outside the scope of my question; I am interested in SAS specific file/data handling solution.

PS: later I may want to put the last record into a different file (eg: data A B; ... output A and if last line output B but there are other complications for another question q.v.).

Many Thanks for reading this far,

Dirk

Respected Advisor
Posts: 3,124

Re: How to ignore (identify) last line (observation) from each RAW input file in a list of multiple input files?

What you have done is first concatenating all of the external files, then input. Therefore your 'lastline' is the last line of your concatenation, not of each individual file. What you need to separate them by using FILEVAR= option.

The following code is obviously not tested for your applications, but I believe it is on the right track.

filename indata pipe ’ls -1 /some_path/my_raw_data_201504*’;

data want;

     length fil2read $100;

     infile indata truncover;

     input f2r $40.;

     fil2read=cats(’/some_path/’,f2r);

     infile dummy filevar=fil2read dsd truncover end=lastline;

     do while(not lastline);

           input <all the data fields want etc>;

           if not lastline then

                output;

     end;

run;

New Contributor
Posts: 2

Re: How to ignore (identify) last line (observation) from each RAW input file in a list of multiple input files?

G'Day Hai (.Kuo?),

Firstly, Many Thanks for taking the trouble to try and help.

At first I too thought that only for the last file the end=lastline was identified. However, in SAS log I can see that for the first file processed I get no error/warning at all, but for all the remaining files to process I get the warning (or was it error I cannot recall right now), about the last line not matching up with the data fields (recall the last line has way different data types/observation to rest of my input file), thence SAS made some null etc. In short it seems to identify end=lastline for the first file processed ok but somehow not do what I want for the remaining files.

NB: You made me think maybe SAS traversed the file list backwards, e.g.: instead of processing file01 file02 file03 ... file29 file30 it may have done file30 first the file29 etc - but again from log I can see it was in ascending order (file01 file02 etc).

As for using pipe I ran into some problems with access rights which appear to be due to myself using (free) SAS University Edition (I use the VMware SAS machine provided and it seems locked down quite a bit).

Now to your while loop, it is something I thought to use but I am interested to find the fastest way to read in the raw data files. Suppose my files are all in 100-120MB with 400,000-500,000 observations each and record length of 700+ long and then I want to batch process for 5-10 years of files!

It seems an overhead to do the while loop (not needed?) plus I think the 'if not lastline then output' is more overhead than the 'if lastline delete''?

However - now that I re-read your code sample I notice the file iteration (loop) appears to now be outside the infile loop?

In any case when I try I have to first of all replace the single quote with double quote:

from this: filename indata pipe ’ls -1 /some_path/my_raw_data_201504*’;

to this: filename indata pipe "ls -1 /some_path/my_raw_data_201504*";


No big deal, but when run I now get this:

ERROR: Insufficient authorization to access PIPE.

ERROR: Error in the FILENAME statement.

Maybe I could create a variable (array?) myindata manually for testing purposes, will look into it and try your approach some more.

Anyway, thanks for your feedback so far, got me thinking...

Thanks,

Dirk

Respected Advisor
Posts: 3,124

Re: How to ignore (identify) last line (observation) from each RAW input file in a list of multiple input files?

While echoing with and , here are some short answers for some of your short questions:

Q:It seems an overhead to do the while loop (not needed?)

A: No, there is no overhead in term of a do-loop here. The purpose of Do-Loop here is to differentiate between different external files,  And what is your suggestion if not using it?

Q: plus I think the 'if not lastline then output' is more overhead than the 'if lastline delete''?

A: In this case, they are 100% equivalent in term of the real process, they are the same thing except if to use  'if lastline delete'', then an additional "output" statement will be needed. The reason I did that was purely to save some typing.

Haikuo

Update: Lack of permission on OS level is Big bummer. At the moment, I can't think of anything that is qualified to be a dynamic and efficient solution for your problem. 

Super User
Posts: 6,971

Re: How to ignore (identify) last line (observation) from each RAW input file in a list of multiple input files?

Since you stated that these control lines have a completely different structure, I would use that as an indicator for processing each input line.

Look for a place in the control line where there is text, while in the data lines there are always numbers.

Look for text that appears only in the control lines.

Something like that.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 6,502

Re: How to ignore (identify) last line (observation) from each RAW input file in a list of multiple input files?

It would be easiest if you could test the content of the line and determine if it is a control record.

For example if each file has a record that says N= and then some number you could just do this.

data want ;

  infile '*.dat' ;

  input @; 

  if _infile_ =: 'N=' then delete;

  input .....

run;

Valued Guide
Posts: 3,208

Re: How to ignore (identify) last line (observation) from each RAW input file in a list of multiple input files?

The last record of a text-file is more difficult question. The difficulty is you can have multiple lines (records) in a text file that are building up to one logical record  (table record).

It can be the also the other way one text record resulting in many SAS records. To be aware the last record is not logical consistent defined, so be it.

The first record of a file has a meaning. You can find that one using the eov of the infile using a concatenation/wildcarding of files. SAS(R) 9.4 Statements: Reference, Third Edition

The best approach is reading a record and check the content. You could use the automatic _infile_ variable for that.

It doesn't make any difference to using the end automatic variable both are set only after the read process is done.

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 6 replies
  • 1616 views
  • 3 likes
  • 5 in conversation