08-15-2014 05:17 AM
Can I check what an easy way to read a text file except the last row is? I have a CSV which looks like:
Have tried various methods but I either end up with Invalid row at last record, or if reading and writing the file back out it messes up the splitting (so data maybe on various row for instance):
filename clindata "xyz.csv";
filename cdata "vsd.csv";
length buffer $4000.;
if strip(buffer) ne "EOF" then put buffer;
Am sure I am just missing an options on the put, but can't think. So I want the line of text exactly per the original file (could be quite long), only ignoring the last row of data.
08-28-2014 10:45 AM
Just to finalise this post. Support sent me some code posted below, which is working for me as an intermediary setup.
infile "s:\temp\rob\x.csv" recfm=n;
file "s:\temp\rob\y.csv" recfm=n;
retain Flag 0;
input a $char1.;
if a = '"' then
if Flag = 0 then
Flag = 1;
Flag = 0;
if Flag = 1 then
if a = '0D'x then
if a = '0A'x then
put a $char1.;
08-15-2014 06:01 AM
No sorry, on the code I posted that was an attempt to read the whole file and put out all text except the final row of text which just has "EOF" without the quotes in. The actual import program I have - not mine - has this code:
infile "S:\Temp\Rob\Rave\ds_progs_gen\v_bup1506_ae.csv" delimiter = ',' MISSOVER DSD firstobs=2 end=eof lrecl=32767;
if eof then do;
if _infile_ ^= 'EOF' then do;
put "ERROR: EOF marker not found in metadata transmission from Rave Web Services. Transmission must have been interrupted. Now aborting.";
However this gives: NOTE: Invalid data for userid in line 37 1-1.
For the line with just EOF. I have tried to get it working so this note goes away, i.e. not reading the last row, but couldn't get it. Then switched over to trying to read in the whole file and output all the rows without the last line, per the code in the first post. Unfortunately this split up all the lines randomly and so killed the structure of the file.
So I am looking for either a way to read the file as is without the EOF or note, or read the whole file/write it back out exactly as it is without the last row.
08-15-2014 06:44 AM
Check that you file does not have a blank line after the line with EOF that will prevent the program show from working. To check that use LIST; See line 4 below with length 0.
37 filename FT15F001 temp;
38 data _null_;
39 infile FT15F001;
NOTE: The infile FT15F001 is:
(system-specific file attributes)
1 VAR1,VAR2,VAR3 14
2 "123","YYY","ZZZ" 17
3 EOF 3
NOTE: 4 records were read from the infile (system-specific pathname).
The minimum record length was 0.
The maximum record length was 17.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
08-15-2014 06:54 AM
To COPY with INFILE/FILE you use INPUT;/PUT _INFILE_; It is likely you will need LRECL option on both FILEREFs.
08-15-2014 07:22 AM
This note shows that the column(s) read for USERID are 1-1
NOTE: Invalid data for userid in line 37 1-1
That implies (to me) that is may not be associated with the line you think it is. Did you also a dump of the LINE associate with the note?
08-15-2014 08:31 AM
Yes, you are quite correct data _null_; in rows 36 of the data, once I had a look through a hex editor, it shows one column as: ..."","..","","",""... and the codes for the two dots = 0D 0A. 0D is a carriage return. Should these not import correctly however as they are in within double quotes? What it is doing is at this point splitting the line and hence I am getting invalid data.
filename cdata "d1.csv";
filename outdata d2.csv";
infile cdata lrecl=32767;
file outdata lrecl=32767;
if _infile_ eq: 'EOF' then stop;
if index(_infile_,'0d'x)>0 or index(_infile_,'0a'x)>0 then do;
This still splits that line.
08-15-2014 09:02 AM
I do not know if double quotes are suppose to "mask" a term string or not. This example implies NO but I'm on UNIX I think you are using Windows so there there may be a difference and my example may be flawed. Is the quoted 0D0A common or was this introduced by saving the file from an editor that wrapped the line.
08-15-2014 09:53 AM
Well, the data is actually via proc http from a database, no processing involved. Yes, am Windows. I think I may need to pass this back to the DB people to sort out as this 0d is causing empty rows to appear.
08-15-2014 10:37 AM
If the data just has an embedded carriage return ('0D'x) and the real end of lines are marked by CR+LF (as is normal on Windows) or LF (as is normal on Unix) then you should be able to read it as is by using the proper value for the TERMSTR= option on the INFILE statement.
08-15-2014 01:28 PM
08-18-2014 04:43 AM
Yes, data_null_; is correct, this is embedded in the data. Have tried the termstr to no avail. The really annoying fact is that Excel (of all things) does read the file correctly. I will pop a service desk ticket in for this and reply if I find anything out.
08-18-2014 01:08 PM
This topic has come up on the list before. For example here is one thread. https://communities.sas.com/message/121271#121271
If Excel is reading it properly then one of the tricks that counts quotes should be able to find and fix the extra line breaks.
Need further help from the community? Please ask a new question.