BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
yadiacho1
Fluorite | Level 6

Need help to solve txt file

txt file

Name|sex|car|track|circuit
Michael|M|ferr|se:2:laps|mugello
Loren|F|Yama|se:3:laps|Brno
Rosi|M|Honda
|se:15:Laps|Assen
Marc|M|Isuzu
|se:20:Laps|Sentul
John Due|M|Mclaren|se:4:Laps|Mandalika

 

script

data WORK.test1 ;
infile 'H:\DATA\Auto\test1.txt' delimiter = '|' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat Name $8. ;
informat sex $7. ;
informat car $7. ;
informat track $6. ;
informat circuit $9. ;
format Name $8. ;
format sex $7. ;
format car $7. ;
format track $6. ;
format circuit $9. ;
input
Name $
sex $
car $
track $
circuit $
;
run;

 

how to fix line break from text file

yadiacho1_0-1671781858395.png

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Is your problem that you have end of line characters embedded into the value of one or more of the fields?

The first thing to check is whether the inserted characters are the exact same as the characters that are actually being used to mark the ends of the lines.  If you are lucky the lines are ending with CR+LF (like a text file on Windows) and the inserted line breaks are using only CR or only LF.   As a quick test read the file using different values of TERMSTR= infile option and see if the number of lines changes.

data _null_;
  infile 'H:\DATA\Auto\test1.txt' termstr=crlf;
  input;
run;
data _null_;
  infile 'H:\DATA\Auto\test1.txt' termstr=lf;
  input;
run;

So it using TERMSTR=CRLF sees a few less lines than using TERMSTR=LF then the lines are ending with CRLF and the extra breaks are just bare LF characters.  So then use TERMSTR=CRLF on the INFILE statement and your code should work (although you should use TRUNCOVER instead of MISSOVER).

infile 'H:\DATA\Auto\test1.txt' dsd dlm='|' truncover termstr=crlf firstobs=2 ;

If that doesn't help then check if the offending values have been quoted.  Something like:

Name|sex|car|track|circuit
Michael|M|ferr|se:2:laps|mugello
Loren|F|Yama|se:3:laps|Brno
Rosi|M|"Honda
accord"|se:15:Laps|Assen

In which case you can just check to for matching quotes to determine if the the line break needs to be removed.

(See this macro %replace_crlf() for logic to copy the file to corrected file and let you replace the embedded LF or CR characters with something else)

 

Otherwise you will need to count the number of delimiters seen.  Which might have trouble if the extra line breaks are in the last field on the line.

data _null_;
  infile 'H:\DATA\Auto\test1.txt';
  file 'H:\DATA\Auto\test1_fixed.txt';
  input ;
  put _infile_ @;
  pipes+countc(_infile_,'|');
  if pipes >= 4 then do;
    put ;
    pipes=0;
  end;
  else put ' ' @;
run;

View solution in original post

6 REPLIES 6
Kurt_Bremser
Super User

Try to remove the MISSOVER and DSD options; the INPUT should then skip to the next line.

If that does not fix it, some trickery involving _INFILE_ will be needed.

 

yadiacho1
Fluorite | Level 6

@Kurt_Bremser thanks, your solution Accepted. but in my real case can't solve it.

because there is missing value and some numbers and date. my column was shifting

 

how to solve it with _infile_ is there any reference on it. Thanks a lot

Patrick
Opal | Level 21

Try to provide sample data that are representative for your real data - ideally sample data that cover all the possible "patterns".

yadiacho1
Fluorite | Level 6

nah, wait. I need to mask confidential data. Maybe I will post another topic. Thanks.

Tom
Super User Tom
Super User

Is your problem that you have end of line characters embedded into the value of one or more of the fields?

The first thing to check is whether the inserted characters are the exact same as the characters that are actually being used to mark the ends of the lines.  If you are lucky the lines are ending with CR+LF (like a text file on Windows) and the inserted line breaks are using only CR or only LF.   As a quick test read the file using different values of TERMSTR= infile option and see if the number of lines changes.

data _null_;
  infile 'H:\DATA\Auto\test1.txt' termstr=crlf;
  input;
run;
data _null_;
  infile 'H:\DATA\Auto\test1.txt' termstr=lf;
  input;
run;

So it using TERMSTR=CRLF sees a few less lines than using TERMSTR=LF then the lines are ending with CRLF and the extra breaks are just bare LF characters.  So then use TERMSTR=CRLF on the INFILE statement and your code should work (although you should use TRUNCOVER instead of MISSOVER).

infile 'H:\DATA\Auto\test1.txt' dsd dlm='|' truncover termstr=crlf firstobs=2 ;

If that doesn't help then check if the offending values have been quoted.  Something like:

Name|sex|car|track|circuit
Michael|M|ferr|se:2:laps|mugello
Loren|F|Yama|se:3:laps|Brno
Rosi|M|"Honda
accord"|se:15:Laps|Assen

In which case you can just check to for matching quotes to determine if the the line break needs to be removed.

(See this macro %replace_crlf() for logic to copy the file to corrected file and let you replace the embedded LF or CR characters with something else)

 

Otherwise you will need to count the number of delimiters seen.  Which might have trouble if the extra line breaks are in the last field on the line.

data _null_;
  infile 'H:\DATA\Auto\test1.txt';
  file 'H:\DATA\Auto\test1_fixed.txt';
  input ;
  put _infile_ @;
  pipes+countc(_infile_,'|');
  if pipes >= 4 then do;
    put ;
    pipes=0;
  end;
  else put ' ' @;
run;
yadiacho1
Fluorite | Level 6
Thanks Guys, Its helpfull. I need to fix text and count the pipes.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 665 views
  • 1 like
  • 4 in conversation