Need help to solve txt file
txt file
Name|sex|car|track|circuit
Michael|M|ferr|se:2:laps|mugello
Loren|F|Yama|se:3:laps|Brno
Rosi|M|Honda
|se:15:Laps|Assen
Marc|M|Isuzu
|se:20:Laps|Sentul
John Due|M|Mclaren|se:4:Laps|Mandalika
script
data WORK.test1 ;
infile 'H:\DATA\Auto\test1.txt' delimiter = '|' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat Name $8. ;
informat sex $7. ;
informat car $7. ;
informat track $6. ;
informat circuit $9. ;
format Name $8. ;
format sex $7. ;
format car $7. ;
format track $6. ;
format circuit $9. ;
input
Name $
sex $
car $
track $
circuit $
;
run;
how to fix line break from text file
Is your problem that you have end of line characters embedded into the value of one or more of the fields?
The first thing to check is whether the inserted characters are the exact same as the characters that are actually being used to mark the ends of the lines. If you are lucky the lines are ending with CR+LF (like a text file on Windows) and the inserted line breaks are using only CR or only LF. As a quick test read the file using different values of TERMSTR= infile option and see if the number of lines changes.
data _null_;
infile 'H:\DATA\Auto\test1.txt' termstr=crlf;
input;
run;
data _null_;
infile 'H:\DATA\Auto\test1.txt' termstr=lf;
input;
run;
So it using TERMSTR=CRLF sees a few less lines than using TERMSTR=LF then the lines are ending with CRLF and the extra breaks are just bare LF characters. So then use TERMSTR=CRLF on the INFILE statement and your code should work (although you should use TRUNCOVER instead of MISSOVER).
infile 'H:\DATA\Auto\test1.txt' dsd dlm='|' truncover termstr=crlf firstobs=2 ;
If that doesn't help then check if the offending values have been quoted. Something like:
Name|sex|car|track|circuit Michael|M|ferr|se:2:laps|mugello Loren|F|Yama|se:3:laps|Brno Rosi|M|"Honda accord"|se:15:Laps|Assen
In which case you can just check to for matching quotes to determine if the the line break needs to be removed.
(See this macro %replace_crlf() for logic to copy the file to corrected file and let you replace the embedded LF or CR characters with something else)
Otherwise you will need to count the number of delimiters seen. Which might have trouble if the extra line breaks are in the last field on the line.
data _null_;
infile 'H:\DATA\Auto\test1.txt';
file 'H:\DATA\Auto\test1_fixed.txt';
input ;
put _infile_ @;
pipes+countc(_infile_,'|');
if pipes >= 4 then do;
put ;
pipes=0;
end;
else put ' ' @;
run;
Try to remove the MISSOVER and DSD options; the INPUT should then skip to the next line.
If that does not fix it, some trickery involving _INFILE_ will be needed.
@Kurt_Bremser thanks, your solution Accepted. but in my real case can't solve it.
because there is missing value and some numbers and date. my column was shifting
how to solve it with _infile_ is there any reference on it. Thanks a lot
Try to provide sample data that are representative for your real data - ideally sample data that cover all the possible "patterns".
nah, wait. I need to mask confidential data. Maybe I will post another topic. Thanks.
Is your problem that you have end of line characters embedded into the value of one or more of the fields?
The first thing to check is whether the inserted characters are the exact same as the characters that are actually being used to mark the ends of the lines. If you are lucky the lines are ending with CR+LF (like a text file on Windows) and the inserted line breaks are using only CR or only LF. As a quick test read the file using different values of TERMSTR= infile option and see if the number of lines changes.
data _null_;
infile 'H:\DATA\Auto\test1.txt' termstr=crlf;
input;
run;
data _null_;
infile 'H:\DATA\Auto\test1.txt' termstr=lf;
input;
run;
So it using TERMSTR=CRLF sees a few less lines than using TERMSTR=LF then the lines are ending with CRLF and the extra breaks are just bare LF characters. So then use TERMSTR=CRLF on the INFILE statement and your code should work (although you should use TRUNCOVER instead of MISSOVER).
infile 'H:\DATA\Auto\test1.txt' dsd dlm='|' truncover termstr=crlf firstobs=2 ;
If that doesn't help then check if the offending values have been quoted. Something like:
Name|sex|car|track|circuit Michael|M|ferr|se:2:laps|mugello Loren|F|Yama|se:3:laps|Brno Rosi|M|"Honda accord"|se:15:Laps|Assen
In which case you can just check to for matching quotes to determine if the the line break needs to be removed.
(See this macro %replace_crlf() for logic to copy the file to corrected file and let you replace the embedded LF or CR characters with something else)
Otherwise you will need to count the number of delimiters seen. Which might have trouble if the extra line breaks are in the last field on the line.
data _null_;
infile 'H:\DATA\Auto\test1.txt';
file 'H:\DATA\Auto\test1_fixed.txt';
input ;
put _infile_ @;
pipes+countc(_infile_,'|');
if pipes >= 4 then do;
put ;
pipes=0;
end;
else put ' ' @;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.