Last row of text file

Accepted Solution Solved
Reply
Super User
Super User
Posts: 7,413
Accepted Solution

Last row of text file

Hi,

Can I check what an easy way to read a text file except the last row is?  I have a CSV which looks like:

VAR1,VAR2,VAR3

"123","YYY","ZZZ"

EOF

Have tried various methods but I either end up with Invalid row at last record, or if reading and writing the file back out it messes up the splitting (so data maybe on various row for instance):

And read/write:

filename clindata "xyz.csv";

filename cdata "vsd.csv";

data _null_;

  length buffer $4000.;

  infile clindata;

  file cdata;

  input buffer;

  if strip(buffer) ne "EOF" then put buffer;

run;

Am sure I am just missing an options on the put, but can't think.  So I want the line of text exactly per the original file (could be quite long), only ignoring the last row of data.


Accepted Solutions
Solution
‎08-28-2014 10:45 AM
Super User
Super User
Posts: 7,413

Re: Last row of text file

Hi,

Just to finalise this post.  Support sent me some code posted below, which is working for me as an intermediary setup.

data _null_;

     infile "s:\temp\rob\x.csv" recfm=n;

     file "s:\temp\rob\y.csv" recfm=n;

     retain Flag 0;

     input a $char1.;

     if a = '"' then

         if Flag = 0 then

             Flag = 1;

         else

             Flag = 0;

     if Flag = 1 then

         do;

         if a = '0D'x then

             do;

             goto exit;

             end;

         if a = '0A'x then

             do;

             goto exit;

             end;

         end;

     put a $char1.;

EXIT:

run;

View solution in original post


All Replies
Super User
Posts: 5,260

Re: Last row of text file

Assuming that you mean EOF is not plain text, rather a control char: have you tried the END= INFILE option?

Data never sleeps
Super User
Super User
Posts: 7,413

Re: Last row of text file

Hi,

No sorry, on the code I posted that was an attempt to read the whole file and put out all text except the final row of text which just has "EOF" without the quotes in.  The actual import program I have - not mine - has this code:

data xyz;

     attrib ...;

  infile "S:\Temp\Rob\Rave\ds_progs_gen\v_bup1506_ae.csv" delimiter = ',' MISSOVER DSD firstobs=2 end=eof lrecl=32767;

   input @;
   if eof then do;
      if _infile_ ^= 'EOF' then do;
         put "ERROR: EOF marker not found in metadata transmission from Rave Web Services. Transmission must have been interrupted. Now aborting.";
         abort abend;
      end;
      stop;
   end;
   else do;

     input ...;

     end;

run;

However this gives: NOTE: Invalid data for userid in line 37 1-1.

For the line with just EOF.  I have tried to get it working so this note goes away, i.e. not reading the last row, but couldn't get it.  Then switched over to trying to read in the whole file and output all the rows without the last line, per the code in the first post.  Unfortunately this split up all the lines randomly and so killed the structure of the file.

So I am looking for either a way to read the file as is without the EOF or note, or read the whole file/write it back out exactly as it is without the last row.

Respected Advisor
Posts: 3,777

Re: Last row of text file

Check that you file does not have a blank line after the line with EOF that will prevent the program show from working.  To check that use LIST;  See line 4 below with length 0.

37         filename FT15F001 temp;

38         data _null_;

39            infile FT15F001;

40            input;

41            list;

42            parmcards;

47         ;;;;

NOTE: The infile FT15F001 is:

      (system-specific pathname),

      (system-specific file attributes)

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                    

1         VAR1,VAR2,VAR3 14

2         "123","YYY","ZZZ" 17

3         EOF 3

4          0

NOTE: 4 records were read from the infile (system-specific pathname).

      The minimum record length was 0.

      The maximum record length was 17.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

Super User
Super User
Posts: 7,413

Re: Last row of text file

Unfrotunately not, the last row is EOF 3.

Respected Advisor
Posts: 3,777

Re: Last row of text file

OK lets see the program that is not working.  I only see bits that you say don't work and I'm not sure what's what.

Respected Advisor
Posts: 3,777

Re: Last row of text file

To COPY with INFILE/FILE you use INPUT;/PUT _INFILE_; It is likely you will need LRECL option on both FILEREFs.

filename FT15F001 temp
filename FT16F001 temp;
data _null_;
  
infile FT15F001 end=eof;
   file FT16F001;
   input;
  
if _infile_ eq: 'EOF' then stop;
  
put _infile_;
  
parmcards;
VAR1,VAR2,VAR3
"123","YYY","ZZZ"
EOF

;;;;
   run;

data _null_;
  
infile FT16F001;
   input;
  
list;
  
run;
Respected Advisor
Posts: 3,777

Re: Last row of text file

This note shows that the column(s) read for USERID are 1-1

NOTE: Invalid data for userid in line 37 1-1


That implies (to me) that is may not be associated with the line you think it is.  Did you also a dump of the LINE associate with the note?

Super User
Super User
Posts: 7,413

Re: Last row of text file

Yes, you are quite correct data _null_; in rows 36 of the data, once I had a look through a hex editor, it shows one column as: ..."","..","","",""... and the codes for the two dots = 0D 0A.  0D is a carriage return.  Should these not import correctly however as they are in within double quotes?  What it is doing is at this point splitting the line and hence I am getting invalid data.

filename cdata "d1.csv";

filename outdata d2.csv";

data _null_;

  infile cdata lrecl=32767;

  file outdata lrecl=32767;

  input;

  if _infile_ eq: 'EOF' then stop;

  if index(_infile_,'0d'x)>0 or index(_infile_,'0a'x)>0 then do;

    _infile_=compress(_infile_,'0d'x);

    _infile_=compress(_infile_,'0a'x);

  end;

  put _infile_;

run;

This still splits that line.

Respected Advisor
Posts: 3,777

Re: Last row of text file

I do not know if double quotes are suppose to "mask" a term string or not.  This example implies NO but I'm on UNIX I think you are using Windows so there there may be a difference and my example may be flawed.  Is the quoted 0D0A common or was this introduced by saving the file from an editor that wrapped the line.

25         data _null_;
26            file FT16F001 termstr=crlf;
27            put 'VAR1,VAR2,VAR3';
28            put '"123","YYY","ZZZ"';
29            put '"123","' '0D0A'x '","ZZZ"';
30            put 'EOF';
31            run;

NOTE:
The file FT16F001 is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE:
4 records were written to the file (system-specific pathname).
      The minimum record length was
3.
      The maximum record length was
17.
NOTE: DATA statement used (Total process time):
      real time          
0.00 seconds
      cpu time           
0.00 seconds
     

32        
33         data _null_;
34            infile FT16F001 dsd dlm=',' termstr=crlf;
35            input;
36            list;
37            run;

NOTE:
The infile FT16F001 is:
      (system-specific pathname),
      (system-specific file attributes)

RULE:     ----+----
1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                     
1         VAR1,VAR2,VAR3 14
2         "123","YYY","ZZZ" 17
3         "123"," 7
4         "
,"ZZZ" 7
5         EOF 3
NOTE:
5 records were read from the infile (system-specific pathname).
      The minimum record length was
3.
      The maximum record length was
17.
NOTE: DATA statement used (Total process time):
      real time          
0.00 seconds
      cpu time           
0.00 seconds
Super User
Super User
Posts: 7,413

Re: Last row of text file

Well, the data is actually via proc http from a database, no processing involved.  Yes, am Windows.  I think I may need to pass this back to the DB people to sort out as this 0d is causing empty rows to appear. 

Super User
Super User
Posts: 6,502

Re: Last row of text file

If the data just has an embedded carriage return ('0D'x) and the real end of lines are marked by CR+LF (as is normal on Windows) or LF (as is normal on Unix) then you should be able to read it as is by using the proper value for the TERMSTR= option on the INFILE statement.

Respected Advisor
Posts: 3,777

Re: Last row of text file

Yes but if I understand what said %superq(once I had a look through a hex editor, it shows one column as: ..."","..","","",""... and the codes for the two dots = 0D 0A. ) it is the full TERMSTR that is embedded not a single '0D'x.

Super User
Super User
Posts: 7,413

Re: Last row of text file

Hi.

Yes, data_null_; is correct, this is embedded in the data.  Have tried the termstr to no avail.  The really annoying fact is that Excel (of all things) does read the file correctly.  I will pop a service desk ticket in for this and reply if I find anything out.

Super User
Super User
Posts: 6,502

Re: Last row of text file

This topic has come up on the list before. For example here is one thread. https://communities.sas.com/message/121271#121271

If Excel is reading it properly then one of the tricks that counts quotes should be able to find and fix the extra line breaks.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 1739 views
  • 6 likes
  • 4 in conversation