Hi,
Can I check what an easy way to read a text file except the last row is? I have a CSV which looks like:
VAR1,VAR2,VAR3
"123","YYY","ZZZ"
EOF
Have tried various methods but I either end up with Invalid row at last record, or if reading and writing the file back out it messes up the splitting (so data maybe on various row for instance):
And read/write:
filename clindata "xyz.csv";
filename cdata "vsd.csv";
data _null_;
length buffer $4000.;
infile clindata;
file cdata;
input buffer;
if strip(buffer) ne "EOF" then put buffer;
run;
Am sure I am just missing an options on the put, but can't think. So I want the line of text exactly per the original file (could be quite long), only ignoring the last row of data.
Hi,
Just to finalise this post. Support sent me some code posted below, which is working for me as an intermediary setup.
data _null_;
infile "s:\temp\rob\x.csv" recfm=n;
file "s:\temp\rob\y.csv" recfm=n;
retain Flag 0;
input a $char1.;
if a = '"' then
if Flag = 0 then
Flag = 1;
else
Flag = 0;
if Flag = 1 then
do;
if a = '0D'x then
do;
goto exit;
end;
if a = '0A'x then
do;
goto exit;
end;
end;
put a $char1.;
EXIT:
run;
Assuming that you mean EOF is not plain text, rather a control char: have you tried the END= INFILE option?
Hi,
No sorry, on the code I posted that was an attempt to read the whole file and put out all text except the final row of text which just has "EOF" without the quotes in. The actual import program I have - not mine - has this code:
data xyz;
attrib ...;
infile "S:\Temp\Rob\Rave\ds_progs_gen\v_bup1506_ae.csv" delimiter = ',' MISSOVER DSD firstobs=2 end=eof lrecl=32767;
input @;
if eof then do;
if _infile_ ^= 'EOF' then do;
put "ERROR: EOF marker not found in metadata transmission from Rave Web Services. Transmission must have been interrupted. Now aborting.";
abort abend;
end;
stop;
end;
else do;
input ...;
end;
run;
However this gives: NOTE: Invalid data for userid in line 37 1-1.
For the line with just EOF. I have tried to get it working so this note goes away, i.e. not reading the last row, but couldn't get it. Then switched over to trying to read in the whole file and output all the rows without the last line, per the code in the first post. Unfortunately this split up all the lines randomly and so killed the structure of the file.
So I am looking for either a way to read the file as is without the EOF or note, or read the whole file/write it back out exactly as it is without the last row.
Check that you file does not have a blank line after the line with EOF that will prevent the program show from working. To check that use LIST; See line 4 below with length 0.
37 filename FT15F001 temp;
38 data _null_;
39 infile FT15F001;
40 input;
41 list;
42 parmcards;
47 ;;;;
NOTE: The infile FT15F001 is:
(system-specific pathname),
(system-specific file attributes)
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 VAR1,VAR2,VAR3 14
2 "123","YYY","ZZZ" 17
3 EOF 3
4 0
NOTE: 4 records were read from the infile (system-specific pathname).
The minimum record length was 0.
The maximum record length was 17.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Unfrotunately not, the last row is EOF 3.
OK lets see the program that is not working. I only see bits that you say don't work and I'm not sure what's what.
To COPY with INFILE/FILE you use INPUT;/PUT _INFILE_; It is likely you will need LRECL option on both FILEREFs.
This note shows that the column(s) read for USERID are 1-1
NOTE: Invalid data for userid in line 37 1-1
That implies (to me) that is may not be associated with the line you think it is. Did you also a dump of the LINE associate with the note?
Yes, you are quite correct data _null_; in rows 36 of the data, once I had a look through a hex editor, it shows one column as: ..."","..","","",""... and the codes for the two dots = 0D 0A. 0D is a carriage return. Should these not import correctly however as they are in within double quotes? What it is doing is at this point splitting the line and hence I am getting invalid data.
filename cdata "d1.csv";
filename outdata d2.csv";
data _null_;
infile cdata lrecl=32767;
file outdata lrecl=32767;
input;
if _infile_ eq: 'EOF' then stop;
if index(_infile_,'0d'x)>0 or index(_infile_,'0a'x)>0 then do;
_infile_=compress(_infile_,'0d'x);
_infile_=compress(_infile_,'0a'x);
end;
put _infile_;
run;
This still splits that line.
I do not know if double quotes are suppose to "mask" a term string or not. This example implies NO but I'm on UNIX I think you are using Windows so there there may be a difference and my example may be flawed. Is the quoted 0D0A common or was this introduced by saving the file from an editor that wrapped the line.
Well, the data is actually via proc http from a database, no processing involved. Yes, am Windows. I think I may need to pass this back to the DB people to sort out as this 0d is causing empty rows to appear.
If the data just has an embedded carriage return ('0D'x) and the real end of lines are marked by CR+LF (as is normal on Windows) or LF (as is normal on Unix) then you should be able to read it as is by using the proper value for the TERMSTR= option on the INFILE statement.
Hi.
Yes, data_null_; is correct, this is embedded in the data. Have tried the termstr to no avail. The really annoying fact is that Excel (of all things) does read the file correctly. I will pop a service desk ticket in for this and reply if I find anything out.
This topic has come up on the list before. For example here is one thread. https://communities.sas.com/message/121271#121271
If Excel is reading it properly then one of the tricks that counts quotes should be able to find and fix the extra line breaks.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.