Hello, I have a pipe-delimited text (.txt) file with carriage returns embedded in one of the variables.
How can I get rid of the carriage returns?
Ideally the text file should look like this
| phone contact | alpha 123; gamma 456; omega 789 | John Doe |
Instead I have this.
| phone contact | alpha 123;
gamma 456;
omega 789 | John Doe|
When I tried reading it into SAS, SAS interpreted the carriage returns as another delimiter just like the pipes.
Any suggestions would be appreciated.
Thanks.
Hi @alphamutau
A good solution is to preprocess the file in a separate step. Here is a piece of code that reads the file, brings the splitted lines in order and writes the result to another file, that can be used as input.
The function depends on the first line in the file having all fields. This will often be the case, because it contains column headers.
Infile:
| aaa | bbb | ccc | | phone contact | alpha 123; gamma 456; omega 789 | John Doe | | phone contact | alpha 123; gamma 456; omega 789 | John Doe| | phone contact | alpha 123; gamma 456; omega 789 | John Doe |
Code:
filename in "c:\temp\have.txt";
filename out "c:\temp\want.txt";
data _null_ ;
	infile in truncover obs=max firstobs=1 lrecl=32765 end=eof;
	file out lrecl=32765 ;
	length OutLine $32765 ;
	retain CoLCount OutLine ;
	input;
	if _N_ = 1 then do;
		CoLCount = count(_infile_,'|') ;
		put _infile_;
	end;
	else do ;
		if substrn(_infile_,1,1) = '|' and count(OutLine,'|') >= CoLCount then do ;
		if OutLine ^= '' then put OutLine ; 
		OutLine = trim(_infile_) ;
	end; 
	else OutLine = catx(' ',OutLine,trim(_infile_)) ;
	if eof then put OutLine ;
	end;  
run;
Outfile
| aaa | bbb | ccc | | phone contact | alpha 123; gamma 456; omega 789 | John Doe | | phone contact | alpha 123; gamma 456; omega 789 | John Doe| | phone contact | alpha 123; gamma 456; omega 789 | John Doe |
Hi @alphamutau
A good solution is to preprocess the file in a separate step. Here is a piece of code that reads the file, brings the splitted lines in order and writes the result to another file, that can be used as input.
The function depends on the first line in the file having all fields. This will often be the case, because it contains column headers.
Infile:
| aaa | bbb | ccc | | phone contact | alpha 123; gamma 456; omega 789 | John Doe | | phone contact | alpha 123; gamma 456; omega 789 | John Doe| | phone contact | alpha 123; gamma 456; omega 789 | John Doe |
Code:
filename in "c:\temp\have.txt";
filename out "c:\temp\want.txt";
data _null_ ;
	infile in truncover obs=max firstobs=1 lrecl=32765 end=eof;
	file out lrecl=32765 ;
	length OutLine $32765 ;
	retain CoLCount OutLine ;
	input;
	if _N_ = 1 then do;
		CoLCount = count(_infile_,'|') ;
		put _infile_;
	end;
	else do ;
		if substrn(_infile_,1,1) = '|' and count(OutLine,'|') >= CoLCount then do ;
		if OutLine ^= '' then put OutLine ; 
		OutLine = trim(_infile_) ;
	end; 
	else OutLine = catx(' ',OutLine,trim(_infile_)) ;
	if eof then put OutLine ;
	end;  
run;
Outfile
| aaa | bbb | ccc | | phone contact | alpha 123; gamma 456; omega 789 | John Doe | | phone contact | alpha 123; gamma 456; omega 789 | John Doe| | phone contact | alpha 123; gamma 456; omega 789 | John Doe |
@alphamutau wrote:
Hello, I have a pipe-delimited text (.txt) file with carriage returns embedded in one of the variables.
How can I get rid of the carriage returns?
Ideally the text file should look like this
| phone contact | alpha 123; gamma 456; omega 789 | John Doe |
Instead I have this.
| phone contact | alpha 123;
gamma 456;
omega 789 | John Doe|
When I tried reading it into SAS, SAS interpreted the carriage returns as another delimiter just like the pipes.
Any suggestions would be appreciated.
Thanks.
If the character is exactly a carriage return you may be able to read the file (if Windows) by using the TERMSTR=CRLF on the infile. That option tells SAS that it takes a carriage return/ line feed pair to be the end of a line and should read right through.
data have;
input;
length want $ 2000;
retain want;
want=cats(want,_infile_);
if countw(want,'|')=4 then do;output;call missing(want);end;
cards4;
| phone contact | alpha 123;
gamma 456;
omega 789 | John Doe|
;;;;
run;
proc print;run;Hi @alphamutau
One possible reason could be that want is not defined long enough to hold the record. Then the input will be truncated, so the pipe count will be < 11, and the record is not outputted, but concatenated to want.
But because there is no room for it in want, it just disappears, and want will remain the same as before. And this will be repeated until end-of-file. So nothing is written to the file after that.
Try setting the lenght to 32765 and see what happens.
Try LRECL= option.And since you only have 10 variables ,countw() should equal 10 .
And better if you will ,share some sample data would be better to test code .
ilename xlog "c:\test\contactlog.txt";
data have;
infile xlog lrecl=1000;
input;
length want $ 2000;
retain want;
want=cats(want,_infile_);
if countw(want,'|')=10 then do;output;call missing(want);end;
;;;;
run;
Or try
filename xlog "c:\test\contactlog.txt" termstr=crlf;
filename xlog "c:\test\contactlog.txt" termstr=lf;
data have;
infile xlog lrecl=1000  length=len  truncover;
input x $varying1000. len;
length want $ 2000;
retain want;
want=cats(want,x);
if countw(want,'|')=10 then do;output;call missing(want);end;
run;
Here is a method that should work as long as a couple of conditions are met.
1) The extra line breaks don't appear after the last delimiter of a line. Your example seems to imply that your lines have an extra delimiter at the end past the last value so that should never happen.
2) The actual data does not have any delimiters. Normally in a delimited file values with delimiters are enclosed in double quotes so that the reader can know not to treat it as indicating the end of the field. This program does have any logic to handle that situation.
Basically this program keeps track of how many pipe characters it has output. When that number is less than number per line then it leaves the output line open. Otherwise it terminates the output line and resets the counter. It replaces the line break with a space. You could use some other character or even nothing, but it looked like your actual data already had a semi-colon so space is probably fine for your case.
First let's make a sample data file.
filename sample temp;
data _null_;
  file sample;
  put 'var1|var2|var3|var4|var5|var6|var7|var8|var9|var10';
  put '1|2|3|4|5|6|7|8'/'has'/'multiple'/'lines|9|10';
  put 'a|b|c|d|e|f|g|h|i|j';
run;Now let's run the program to convert that into a file with one line per observation.
filename want temp;
data _null_;
  infile sample;
  file want;
  input ;
  put _infile_ @;
  npipe+countc(_infile_,'|');
  if npipe < 9 then put ' ' @;
  else do; 
    put;
    npipe=0;
  end;
run;Results:
var1|var2|var3|var4|var5|var6|var7|var8|var9|var10 1|2|3|4|5|6|7|8 has multiple lines|9|10 a|b|c|d|e|f|g|h|i|j
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
