BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Emma8
Quartz | Level 8

Hello, I have a text file that some observations have line breaks (Date of last seen, attached).

When I read into SAS how can I fix this line breaks? Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Emma8 

To check if there are any date strings either not in the right format or the date string doesn't represent a valid date, code as below should do.

Date strings not in the right format could be caused by your source data also being broken up onto more than one line in a different way than what you've shown us so far. You need to investigate if there are such cases and eventually provide sample data which reflect all cases you have in your real data (if there is more).

proc format;
  value $case
    '01'='date string missing or incomplete'
    '02'='date string in the correct pattern but date invalid'
  ;
run;
data dq;
  format case $case.;
  set want;
  if not prxmatch('/\d\d\/\d\d\/\d{4}/oi',date_string) then 
    do;
      case='01';
      output;
    end;
  else
  if missing(date) then
    do;
      case='02';
      output;
    end;
run;
proc print;
run;  

 

View solution in original post

7 REPLIES 7
Tom
Super User Tom
Super User

Is this the data you dumped from WORD?  Can you just change the margins/font on the word file to prevent line wrapping before converting it to text?

 

You could probably fix this by reading in the file and writing it back out without the extra line breaks.  You could even make so each record is all on one line.

data _null_;
 infile original;
 file fix;
 input;
 if _n_>1 and left(_infile_)=:'Master File:' then put;
 len=lengthn(_infile_);
 put _infile_ $varying200. len @;
run;

Then reading the fixed file will be easier since all of the values for one line.

 

ballardw
Super User

Have you done anything at all to that file since receipt? Or possibly someone else editing it in some fashion such as replacing tab characters?

 

Typically I would expect the line lengths where the values are cutoff to be at the same position but that is not the case here.

Emma8
Quartz | Level 8

Great! Thank you so much!

Patrick
Opal | Level 21

@Emma8 

I've only realized that I've missed what you're dealing with when I read @Tom's answer. The resolution I've provided in your other post is therefore not correct (and I've removed it from there).

Here the code again with an amendment that will read such date strings broken over two lines. It will only work for the exact pattern of a broken up date string as provided in your sample data. I've also introduced an additional variable Date_String which contains the source string before we try to convert it into a SAS date.  This to keep invalid date strings in your target data - like 09/00/1990 from your sample - so you can later on add some additional logic as you need it to eventually fix such dates.

 

/*filename src <path/filename.txt> lrecl=256;*/

data want(drop=_:);
  length
    Source $11
    Id $8
    Name $80
    Visit $15
    ;
  format Birthday date9. Date date9.;
  infile src scanover truncover col=_col;
  input @;

  if find(_infile_,'Master File:')>0 then
    do;
      source='Master File';
      input @1 @'Master File:' +1 ID $8. @;
    end;
  else 
  if find(_infile_,'DDDI File:')>0 then
    do;
      source='DDDI File';
      input @1 @'DDDI File:' +1 ID $8. @;
    end;
  else delete;

  /* name */
  input @_col @'Name:' +(-1) _dummy $1.@;
  _start=_col;
  input @_col @'VIS:' +(-5) _dummy $1. @;
  _stop=_col;
  _fwidth=_stop - _start;
  input @_start Name $varying80. _fwidth @;
  name=compbl(name);

  /* visit */
  _start=_col+5;
  input @_col @'Birthdate:' +(-11) _dummy $1. @;
  _stop=_col;
  _fwidth=_stop - _start;
  input @_start visit $varying15. _fwidth @;

  /* Birthdate*/
  input @'Birthdate:' Birthday :mmddyy10. @;

  /* Date - also addresses case of date string broken over two source lines */
  length Date_String $10;
  input @':' Date_String:$10.;
  if lengthn(Date_String)=10 then Date=input(Date_String,?? mmddyy10.);
  else
    do;
      input @@;
      if lengthn(_infile_)=2 then
        do;
          Date_String=cats(Date_String,_infile_);
          Date=input(Date_String,?? mmddyy10.);
          input;
        end;
    end;

run;

proc print;
run;

Capture.JPG 

Emma8
Quartz | Level 8
Thank you so much
Patrick
Opal | Level 21

@Emma8 

To check if there are any date strings either not in the right format or the date string doesn't represent a valid date, code as below should do.

Date strings not in the right format could be caused by your source data also being broken up onto more than one line in a different way than what you've shown us so far. You need to investigate if there are such cases and eventually provide sample data which reflect all cases you have in your real data (if there is more).

proc format;
  value $case
    '01'='date string missing or incomplete'
    '02'='date string in the correct pattern but date invalid'
  ;
run;
data dq;
  format case $case.;
  set want;
  if not prxmatch('/\d\d\/\d\d\/\d{4}/oi',date_string) then 
    do;
      case='01';
      output;
    end;
  else
  if missing(date) then
    do;
      case='02';
      output;
    end;
run;
proc print;
run;  

 

Emma8
Quartz | Level 8
Very helpful. Thank you so much!
Have a pleasant weekend and holiday

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1384 views
  • 1 like
  • 4 in conversation