BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
alphamutau
Calcite | Level 5

Hello, I have a pipe-delimited text (.txt) file with carriage returns embedded in one of the variables.

How can I get rid of the carriage returns?

 

Ideally the text file should look like this

 

| phone contact | alpha 123; gamma 456; omega 789 | John Doe |

 

Instead I have this.

 

| phone contact | alpha 123;

gamma 456;

omega 789 | John Doe|

 

When I tried reading it into SAS, SAS interpreted the carriage returns as another delimiter just like the pipes.

 

Any suggestions would be appreciated.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @alphamutau 

 

A good solution is to preprocess the file in a separate step. Here is a piece of code that reads the file, brings the splitted lines in order and writes the result to another file, that can be used as input.

 

The function depends on the first line in the file having all fields. This will often be the case, because it contains column headers.

 

Infile:

 

| aaa | bbb | ccc |
| phone contact | alpha 123; gamma 456; omega 789 | John Doe |
| phone contact | alpha 123;
gamma 456;
omega 789 | John Doe|
| phone contact | alpha 123; gamma 456; omega 789 | John Doe |

Code:

 

filename in "c:\temp\have.txt";
filename out "c:\temp\want.txt";

data _null_ ;
	infile in truncover obs=max firstobs=1 lrecl=32765 end=eof;
	file out lrecl=32765 ;
	length OutLine $32765 ;
	retain CoLCount OutLine ;
	input;
	if _N_ = 1 then do;
		CoLCount = count(_infile_,'|') ;
		put _infile_;
	end;
	else do ;
		if substrn(_infile_,1,1) = '|' and count(OutLine,'|') >= CoLCount then do ;
		if OutLine ^= '' then put OutLine ; 
		OutLine = trim(_infile_) ;
	end; 
	else OutLine = catx(' ',OutLine,trim(_infile_)) ;
	if eof then put OutLine ;
	end;  
run;

Outfile

 

| aaa | bbb | ccc |
| phone contact | alpha 123; gamma 456; omega 789 | John Doe |
| phone contact | alpha 123; gamma 456; omega 789 | John Doe|
| phone contact | alpha 123; gamma 456; omega 789 | John Doe |

View solution in original post

9 REPLIES 9
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @alphamutau 

 

A good solution is to preprocess the file in a separate step. Here is a piece of code that reads the file, brings the splitted lines in order and writes the result to another file, that can be used as input.

 

The function depends on the first line in the file having all fields. This will often be the case, because it contains column headers.

 

Infile:

 

| aaa | bbb | ccc |
| phone contact | alpha 123; gamma 456; omega 789 | John Doe |
| phone contact | alpha 123;
gamma 456;
omega 789 | John Doe|
| phone contact | alpha 123; gamma 456; omega 789 | John Doe |

Code:

 

filename in "c:\temp\have.txt";
filename out "c:\temp\want.txt";

data _null_ ;
	infile in truncover obs=max firstobs=1 lrecl=32765 end=eof;
	file out lrecl=32765 ;
	length OutLine $32765 ;
	retain CoLCount OutLine ;
	input;
	if _N_ = 1 then do;
		CoLCount = count(_infile_,'|') ;
		put _infile_;
	end;
	else do ;
		if substrn(_infile_,1,1) = '|' and count(OutLine,'|') >= CoLCount then do ;
		if OutLine ^= '' then put OutLine ; 
		OutLine = trim(_infile_) ;
	end; 
	else OutLine = catx(' ',OutLine,trim(_infile_)) ;
	if eof then put OutLine ;
	end;  
run;

Outfile

 

| aaa | bbb | ccc |
| phone contact | alpha 123; gamma 456; omega 789 | John Doe |
| phone contact | alpha 123; gamma 456; omega 789 | John Doe|
| phone contact | alpha 123; gamma 456; omega 789 | John Doe |
alphamutau
Calcite | Level 5
Many thanks for your SAS code.



It really got me out of a jam!


ballardw
Super User

@alphamutau wrote:

Hello, I have a pipe-delimited text (.txt) file with carriage returns embedded in one of the variables.

How can I get rid of the carriage returns?

 

Ideally the text file should look like this

 

| phone contact | alpha 123; gamma 456; omega 789 | John Doe |

 

Instead I have this.

 

| phone contact | alpha 123;

gamma 456;

omega 789 | John Doe|

 

When I tried reading it into SAS, SAS interpreted the carriage returns as another delimiter just like the pipes.

 

Any suggestions would be appreciated.

Thanks.


If the character is exactly a carriage return you may be able to read the file (if Windows) by using the TERMSTR=CRLF on the infile. That option tells SAS that it takes a carriage return/ line feed pair to be the end of a line and should read right through.

Ksharp
Super User
data have;
input;
length want $ 2000;
retain want;
want=cats(want,_infile_);
if countw(want,'|')=4 then do;output;call missing(want);end;
cards4;
| phone contact | alpha 123;
gamma 456;
omega 789 | John Doe|
;;;;
run;

proc print;run;
alphamutau
Calcite | Level 5
Thank you Ksharp.



Your solution worked and got rid of the embedded carriage returns.



But I ran into another problem.



I modified the code to read in a text file with 10,405 records. About 1% percent of the records have embedded carriage returns. The remaining 99% are fine.



The program inputted all 10,405 records.



But it only outputted 80 records. I was expecting around 10,300.



The SAS log didn't indicate any errors so I'm a little puzzled.



The modified code is below.


filename xlog "c:\test\contactlog.txt";

data have;
infile xlog;
input;
length want $ 2000;
retain want;
want=cats(want,_infile_);
if countw(want,'|')=11 then do;output;call missing(want);end;
;;;;
run;



contactlog.txt has 10 variables but the first variable doesn't start with a pipe.

The layout looks like this



var1 | var2 | var3 | var4 | var5 | var 6 | var7 | var8 | var9 | var10 |



var8 is the problem variable that sometimes has embedded carriage returns.



Thanks




ErikLund_Jensen
Rhodochrosite | Level 12

Hi @alphamutau 

 

One possible reason could be that want is not defined long enough to hold the record. Then the input will be truncated, so the pipe count will be < 11, and the record is not outputted, but concatenated to want.

 

But because there is no room for it in want, it just disappears, and want will remain the same as before. And this will be repeated until end-of-file. So nothing is written to the file after that.

 

Try setting the lenght to 32765 and see what happens.

Ksharp
Super User

Try LRECL= option.And since you only have 10 variables ,countw() should equal 10 .

And better if you will ,share some sample data would be better to test code .

 

 

 

ilename xlog "c:\test\contactlog.txt";

data have;
infile xlog lrecl=1000;
input;
length want $ 2000;
retain want;
want=cats(want,_infile_);
if countw(want,'|')=10 then do;output;call missing(want);end;
;;;;
run;

Ksharp
Super User

Or try

 

filename xlog "c:\test\contactlog.txt" termstr=crlf;
filename xlog "c:\test\contactlog.txt" termstr=lf;
data have;
infile xlog lrecl=1000  length=len  truncover;
input x $varying1000. len;
length want $ 2000;
retain want;
want=cats(want,x);
if countw(want,'|')=10 then do;output;call missing(want);end;
run;

Tom
Super User Tom
Super User

Here is a method that should work as long as a couple of conditions are met.

1) The extra line breaks don't appear after the last delimiter of a line.  Your example seems to imply that your lines have an extra delimiter at the end past the last value so that should never happen.

2) The actual data does not have any delimiters.  Normally in a delimited file values with delimiters are enclosed in double quotes so that the reader can know not to treat it as indicating the end of the field.  This program does have any logic to handle that situation.

 

Basically this program keeps track of how many pipe characters it has output. When that number is less than number per line then it leaves the output line open. Otherwise it terminates the output line and resets the counter.  It replaces the line break with a space.  You could use some other character or even nothing, but it looked like your actual data already had a semi-colon so space is probably fine for your case.

 

First let's make a sample data file.

filename sample temp;
data _null_;
  file sample;
  put 'var1|var2|var3|var4|var5|var6|var7|var8|var9|var10';
  put '1|2|3|4|5|6|7|8'/'has'/'multiple'/'lines|9|10';
  put 'a|b|c|d|e|f|g|h|i|j';
run;

Now  let's run the program to convert that into a file with one line per observation. 

filename want temp;
data _null_;
  infile sample;
  file want;
  input ;
  put _infile_ @;
  npipe+countc(_infile_,'|');
  if npipe < 9 then put ' ' @;
  else do; 
    put;
    npipe=0;
  end;
run;

Results:

var1|var2|var3|var4|var5|var6|var7|var8|var9|var10
1|2|3|4|5|6|7|8 has multiple lines|9|10
a|b|c|d|e|f|g|h|i|j

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2159 views
  • 1 like
  • 5 in conversation