BookmarkSubscribeRSS Feed
rogcmoore
Calcite | Level 5

Importing several delimited text files (by pipe: |). actual lengths of fields can vary.

 

Several character strings contain tabs and carriage return type characters, which is throwing out the INFILE. The records themselves appear on a single row in the file. Character strings are not surrounded by double quotes.

 

Affected data is free form text fields, from copy-paste (emails and similar).

 

Proc import was originally tried but even with guessingrows max it isn't processing the file correctly. a basic SAS dataset with INFILE DLM and the like also processes the file incorrectly.

 

Eventually used SAS Enterprise Guide's inbuilt "Import" function. This does read the correct number of records. But also caused fields to be read in with odd positioning: field A becomes field B and so on.

 

Any thoughts?

3 REPLIES 3
Ksharp
Super User
That would be helpful to solve your problem if you post some REAL data and desired output you are looking for .
Tom
Super User Tom
Super User

TAB characters should not cause any trouble when reading the file. (They will only cause trouble latter.)

PIPE characters WILL cause trouble if the fields that contain them are NOT quoted.  Do any PIPE characters exist in the value of any field?

 

As for CR they are only a problem if the end of lines are marked by CR only, which I doubt you have (unless you accidentally let EXCEL on a Mac default to using CR as the end of line character).

 

If the end of the lines are marked by the Windows default of CR-LF combination and the values inside the string are only bare LF or bare CR then just adding the TERMSTR=CRLF option to the INFILE statement (or to the FILENAME statement if using PROC IMPORT) should solve the problem.

 

More likely the issue is either embedded LF characters when the end of the lines are marked only be LF or embedded CR-LF combinations when the end of lines are marked with CR-LF.

 

In that base you might be able to fix the file by reading it character by character and removing any LF characters that appear before you have seen the expected number of PIPE characters.

Tom
Super User Tom
Super User

Let's make an example file with an embedded line break and try reading it with PROC IMPORT.

filename csv temp termstr=crlf;
options parmcards=csv;
parmcards;
col1|col2|col3
value1|value2|value3
value4a
 value4b|value5|value6
;

proc import dbms=csv file=csv out=bad replace;
  delimiter='|';
run;

Result:

Tom_0-1715133466523.png

Now lets convert the embedded LF's into spaces and then try again.

filename fixed temp;
data _null_;
  infile csv lrecl=1 recfm=f ;
  file fixed recfm=n;
  retain np 0;
  input ch $char1. ;
  if ch='|' then np+1;
  if ch='0A'x and lag(ch)='0D'x and np>= 2 then np=0;
  else if ch='0A'x then ch=' ';
  put ch $char1.;
run;

proc import dbms=csv file=fixed out=fixed replace;
  delimiter='|';
run;

Result

Tom_1-1715133558073.png

 

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 213 views
  • 0 likes
  • 3 in conversation