BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Can I check what an easy way to read a text file except the last row is?  I have a CSV which looks like:

VAR1,VAR2,VAR3

"123","YYY","ZZZ"

EOF

Have tried various methods but I either end up with Invalid row at last record, or if reading and writing the file back out it messes up the splitting (so data maybe on various row for instance):

And read/write:

filename clindata "xyz.csv";

filename cdata "vsd.csv";

data _null_;

  length buffer $4000.;

  infile clindata;

  file cdata;

  input buffer;

  if strip(buffer) ne "EOF" then put buffer;

run;

Am sure I am just missing an options on the put, but can't think.  So I want the line of text exactly per the original file (could be quite long), only ignoring the last row of data.

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Just to finalise this post.  Support sent me some code posted below, which is working for me as an intermediary setup.

data _null_;

     infile "s:\temp\rob\x.csv" recfm=n;

     file "s:\temp\rob\y.csv" recfm=n;

     retain Flag 0;

     input a $char1.;

     if a = '"' then

         if Flag = 0 then

             Flag = 1;

         else

             Flag = 0;

     if Flag = 1 then

         do;

         if a = '0D'x then

             do;

             goto exit;

             end;

         if a = '0A'x then

             do;

             goto exit;

             end;

         end;

     put a $char1.;

EXIT:

run;

View solution in original post

16 REPLIES 16
LinusH
Tourmaline | Level 20

Assuming that you mean EOF is not plain text, rather a control char: have you tried the END= INFILE option?

Data never sleeps
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

No sorry, on the code I posted that was an attempt to read the whole file and put out all text except the final row of text which just has "EOF" without the quotes in.  The actual import program I have - not mine - has this code:

data xyz;

     attrib ...;

  infile "S:\Temp\Rob\Rave\ds_progs_gen\v_bup1506_ae.csv" delimiter = ',' MISSOVER DSD firstobs=2 end=eof lrecl=32767;

   input @;
   if eof then do;
      if _infile_ ^= 'EOF' then do;
         put "ERROR: EOF marker not found in metadata transmission from Rave Web Services. Transmission must have been interrupted. Now aborting.";
         abort abend;
      end;
      stop;
   end;
   else do;

     input ...;

     end;

run;

However this gives: NOTE: Invalid data for userid in line 37 1-1.

For the line with just EOF.  I have tried to get it working so this note goes away, i.e. not reading the last row, but couldn't get it.  Then switched over to trying to read in the whole file and output all the rows without the last line, per the code in the first post.  Unfortunately this split up all the lines randomly and so killed the structure of the file.

So I am looking for either a way to read the file as is without the EOF or note, or read the whole file/write it back out exactly as it is without the last row.

data_null__
Jade | Level 19

Check that you file does not have a blank line after the line with EOF that will prevent the program show from working.  To check that use LIST;  See line 4 below with length 0.

37         filename FT15F001 temp;

38         data _null_;

39            infile FT15F001;

40            input;

41            list;

42            parmcards;

47         ;;;;

NOTE: The infile FT15F001 is:

      (system-specific pathname),

      (system-specific file attributes)

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                    

1         VAR1,VAR2,VAR3 14

2         "123","YYY","ZZZ" 17

3         EOF 3

4          0

NOTE: 4 records were read from the infile (system-specific pathname).

      The minimum record length was 0.

      The maximum record length was 17.

NOTE: DATA statement used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Unfrotunately not, the last row is EOF 3.

data_null__
Jade | Level 19

OK lets see the program that is not working.  I only see bits that you say don't work and I'm not sure what's what.

data_null__
Jade | Level 19

To COPY with INFILE/FILE you use INPUT;/PUT _INFILE_; It is likely you will need LRECL option on both FILEREFs.

filename FT15F001 temp
filename FT16F001 temp;
data _null_;
  
infile FT15F001 end=eof;
   file FT16F001;
   input;
  
if _infile_ eq: 'EOF' then stop;
  
put _infile_;
  
parmcards;
VAR1,VAR2,VAR3
"123","YYY","ZZZ"
EOF

;;;;
   run;

data _null_;
  
infile FT16F001;
   input;
  
list;
  
run;
data_null__
Jade | Level 19

This note shows that the column(s) read for USERID are 1-1

NOTE: Invalid data for userid in line 37 1-1


That implies (to me) that is may not be associated with the line you think it is.  Did you also a dump of the LINE associate with the note?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, you are quite correct data _null_; in rows 36 of the data, once I had a look through a hex editor, it shows one column as: ..."","..","","",""... and the codes for the two dots = 0D 0A.  0D is a carriage return.  Should these not import correctly however as they are in within double quotes?  What it is doing is at this point splitting the line and hence I am getting invalid data.

filename cdata "d1.csv";

filename outdata d2.csv";

data _null_;

  infile cdata lrecl=32767;

  file outdata lrecl=32767;

  input;

  if _infile_ eq: 'EOF' then stop;

  if index(_infile_,'0d'x)>0 or index(_infile_,'0a'x)>0 then do;

    _infile_=compress(_infile_,'0d'x);

    _infile_=compress(_infile_,'0a'x);

  end;

  put _infile_;

run;

This still splits that line.

data_null__
Jade | Level 19

I do not know if double quotes are suppose to "mask" a term string or not.  This example implies NO but I'm on UNIX I think you are using Windows so there there may be a difference and my example may be flawed.  Is the quoted 0D0A common or was this introduced by saving the file from an editor that wrapped the line.

25         data _null_;
26            file FT16F001 termstr=crlf;
27            put 'VAR1,VAR2,VAR3';
28            put '"123","YYY","ZZZ"';
29            put '"123","' '0D0A'x '","ZZZ"';
30            put 'EOF';
31            run;

NOTE:
The file FT16F001 is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE:
4 records were written to the file (system-specific pathname).
      The minimum record length was
3.
      The maximum record length was
17.
NOTE: DATA statement used (Total process time):
      real time          
0.00 seconds
      cpu time           
0.00 seconds
     

32        
33         data _null_;
34            infile FT16F001 dsd dlm=',' termstr=crlf;
35            input;
36            list;
37            run;

NOTE:
The infile FT16F001 is:
      (system-specific pathname),
      (system-specific file attributes)

RULE:     ----+----
1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                     
1         VAR1,VAR2,VAR3 14
2         "123","YYY","ZZZ" 17
3         "123"," 7
4         "
,"ZZZ" 7
5         EOF 3
NOTE:
5 records were read from the infile (system-specific pathname).
      The minimum record length was
3.
      The maximum record length was
17.
NOTE: DATA statement used (Total process time):
      real time          
0.00 seconds
      cpu time           
0.00 seconds
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, the data is actually via proc http from a database, no processing involved.  Yes, am Windows.  I think I may need to pass this back to the DB people to sort out as this 0d is causing empty rows to appear. 

Tom
Super User Tom
Super User

If the data just has an embedded carriage return ('0D'x) and the real end of lines are marked by CR+LF (as is normal on Windows) or LF (as is normal on Unix) then you should be able to read it as is by using the proper value for the TERMSTR= option on the INFILE statement.

data_null__
Jade | Level 19

Yes but if I understand what said %superq(once I had a look through a hex editor, it shows one column as: ..."","..","","",""... and the codes for the two dots = 0D 0A. ) it is the full TERMSTR that is embedded not a single '0D'x.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi.

Yes, data_null_; is correct, this is embedded in the data.  Have tried the termstr to no avail.  The really annoying fact is that Excel (of all things) does read the file correctly.  I will pop a service desk ticket in for this and reply if I find anything out.

Tom
Super User Tom
Super User

This topic has come up on the list before. For example here is one thread. https://communities.sas.com/message/121271#121271

If Excel is reading it properly then one of the tricks that counts quotes should be able to find and fix the extra line breaks.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 7210 views
  • 6 likes
  • 4 in conversation