Importing several delimited text files (by pipe: |). actual lengths of fields can vary.
Several character strings contain tabs and carriage return type characters, which is throwing out the INFILE. The records themselves appear on a single row in the file. Character strings are not surrounded by double quotes.
Affected data is free form text fields, from copy-paste (emails and similar).
Proc import was originally tried but even with guessingrows max it isn't processing the file correctly. a basic SAS dataset with INFILE DLM and the like also processes the file incorrectly.
Eventually used SAS Enterprise Guide's inbuilt "Import" function. This does read the correct number of records. But also caused fields to be read in with odd positioning: field A becomes field B and so on.
Any thoughts?
TAB characters should not cause any trouble when reading the file. (They will only cause trouble latter.)
PIPE characters WILL cause trouble if the fields that contain them are NOT quoted. Do any PIPE characters exist in the value of any field?
As for CR they are only a problem if the end of lines are marked by CR only, which I doubt you have (unless you accidentally let EXCEL on a Mac default to using CR as the end of line character).
If the end of the lines are marked by the Windows default of CR-LF combination and the values inside the string are only bare LF or bare CR then just adding the TERMSTR=CRLF option to the INFILE statement (or to the FILENAME statement if using PROC IMPORT) should solve the problem.
More likely the issue is either embedded LF characters when the end of the lines are marked only be LF or embedded CR-LF combinations when the end of lines are marked with CR-LF.
In that base you might be able to fix the file by reading it character by character and removing any LF characters that appear before you have seen the expected number of PIPE characters.
Let's make an example file with an embedded line break and try reading it with PROC IMPORT.
filename csv temp termstr=crlf;
options parmcards=csv;
parmcards;
col1|col2|col3
value1|value2|value3
value4a
value4b|value5|value6
;
proc import dbms=csv file=csv out=bad replace;
delimiter='|';
run;
Result:
Now lets convert the embedded LF's into spaces and then try again.
filename fixed temp;
data _null_;
infile csv lrecl=1 recfm=f ;
file fixed recfm=n;
retain np 0;
input ch $char1. ;
if ch='|' then np+1;
if ch='0A'x and lag(ch)='0D'x and np>= 2 then np=0;
else if ch='0A'x then ch=' ';
put ch $char1.;
run;
proc import dbms=csv file=fixed out=fixed replace;
delimiter='|';
run;
Result
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.