BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
CHUNantes
Obsidian | Level 7

Hello.

 

We try to load a file into a SAS table so we use INFILE filename DLM='\' DSD

 

FILENAME ficin "path_to_file.txt";

DATA work.users;
   INFILE ficin DLM='\' DSD;
   INFORMAT Name $20. City $20. Birthdate DDMMYY10.;
   FORMAT Name $20. City $20. Birthdate DDMMYY10.;
   INPUT Name $ City $ Birthdate;
RUN;

But sometimes we can receive a file with a Carriage Return in the middle of the line :

Mikael\NANTES\01/01/2000
Guillaume\
NANTES\01/02/2000
Anne\\01/03/2000

And of course, we have to manage missing values so we need to keep the DSD option.

 

How can we load this file ?

 

We have try to remove the CR character using sed on our Linux server before reading the file without succes.

 

Regards

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data _null_;
 infile "c:\temp\donnees.txt" dsd termstr=crlf;
 file   "c:\temp\want.txt";
 input ;
 length temp $ 2000;
 retain temp;
 temp=cats(temp,_infile_); 
 if countw(temp,'\','mq')=7 then do;put temp;call missing(temp);end;
run;

data want;
 infile   "c:\temp\want.txt" DLM='\' dsd termstr=crlf;
 length  N_Code_NoIPP 8
           C_Code_NoIPPExterne $12.
            C_Nom $50.
            C_Prenom $50.
            C_Telephone $20.
            C_Var1 $1.
            C_Var2 $1.
            ;
    input   N_Code_NoIPP
            C_Code_NoIPPExterne $
            C_Nom $
            C_Prenom $
            C_Telephone $
            C_Var1 $
            C_Var2 $
            ;
RUN;

View solution in original post

11 REPLIES 11
Patrick
Opal | Level 21

What is the end of line indicator for your data? LF , CRLF ?

 

In Unix the default SAS uses is LF so a CR in the data shouldn't mess-up mapping of data into SAS variables. The CR would get stored in a variable though which is not something you normally want.

 

1. Figure out the EoL (I'm normally using Notepad++ for that just opening the file while showing all characters).

2. If EoL anything other than LF then use infile option TERMSTR to define it - i.e. INFILE.... TERMSTR=CRLF;

3. To remove CR and other control characters use translate() or compress() or whatever is more suitable. 

data want;
infile ..... termstr=CRLF truncover dsd dlm='\';
input @;
_infile_=compress(_infile_,,'c');
input <your vars>;
run;

 

CHUNantes
Obsidian | Level 7

Thanks for your replie.

 

The file use CRLF as end of line

 

If I use Notepad++ with Hexa editor, here is a valid line ending :

Correct_line.png

 

 

And here is a line with the CRLF in the middle of the line :

Invalide_line.png

 

As the special character is a0, I have try to replace a00d0a with space or another character using sed command but I didn't succeed.

 

With TERMSTR=CRLF, only the first line is read.

 

As all lines are ending with \+CRLF (so 5c 0d 0a), other occurence of 0d 0a should be removed.

Ksharp
Super User

It is complicated question. Better post your real data to demonstrate your Q.

 

data _null_;
 infile "c:\temp\path_to_file.txt" dsd termstr=crlf;
 file   "c:\temp\want.txt";
 input ;
 length temp $ 2000;
 retain temp;
 temp=cats(temp,_infile_); 
 if countw(temp,'\','mq')=3 then do;put temp;call missing(temp);end;
run;

data want;
 infile   "c:\temp\want.txt" DLM='\' dsd termstr=crlf;
 INFORMAT Name $20. City $20. Birthdate DDMMYY10.;
   FORMAT Name $20. City $20. Birthdate DDMMYY10.;
   INPUT Name $ City $ Birthdate;
RUN;
Ksharp
Super User

It is complicated question. Better post your real data to demonstrate your Q.

 

data _null_;
 infile "c:\temp\path_to_file.txt" dsd termstr=crlf;
 file   "c:\temp\want.txt";
 input ;
 length temp $ 2000;
 retain temp;
 temp=cats(temp,_infile_); 
 if countw(temp,'\','mq')=3 then do;put temp;call missing(temp);end;
run;

data want;
 infile   "c:\temp\want.txt" DLM='\' dsd termstr=crlf;
 INFORMAT Name $20. City $20. Birthdate DDMMYY10.;
   FORMAT Name $20. City $20. Birthdate DDMMYY10.;
   INPUT Name $ City $ Birthdate;
RUN;
CHUNantes
Obsidian | Level 7

I will post a small part of the file but I have to anonymise it before 😉

CHUNantes
Obsidian | Level 7

Here is a sample of the file and the SAS programm.

 

filename ficin "C:\temp\donnees.txt" LRECL=1024;

data donnees;
    infile ficin missover dsd dlm='\';
    length  C_Code_NoIPPExterne $12.
            C_Nom $50.
            C_Prenom $50.
            C_Telephone $20.
            C_Var1 $1.
            C_Var2 $1.
            ;
    input   N_Code_NoIPP
            C_Code_NoIPPExterne $
            C_Nom $
            C_Prenom $
            C_Telephone $
            C_Var1 $
            C_Var2 $
            ;
run;
Ksharp
Super User
data _null_;
 infile "c:\temp\donnees.txt" dsd termstr=crlf;
 file   "c:\temp\want.txt";
 input ;
 length temp $ 2000;
 retain temp;
 temp=cats(temp,_infile_); 
 if countw(temp,'\','mq')=7 then do;put temp;call missing(temp);end;
run;

data want;
 infile   "c:\temp\want.txt" DLM='\' dsd termstr=crlf;
 length  N_Code_NoIPP 8
           C_Code_NoIPPExterne $12.
            C_Nom $50.
            C_Prenom $50.
            C_Telephone $20.
            C_Var1 $1.
            C_Var2 $1.
            ;
    input   N_Code_NoIPP
            C_Code_NoIPPExterne $
            C_Nom $
            C_Prenom $
            C_Telephone $
            C_Var1 $
            C_Var2 $
            ;
RUN;
CHUNantes
Obsidian | Level 7

Thank you for your code.

 

My original file have more variables by lines and I have to deal with ' used by users in some fields but now I can load the file 😉

 

Here is the programm I use :

 

data temp;
 infile "/ECHANGE/PRQ/in/archive/CLI_20210501/202105010051DFD_MVTS_PATIENTS.NDW" dsd missover;
 file "/ECHANGE/PRQ/in/archive/CLI_20210501/FD_MVTS_PATIENTS.NDW";
 input ;
 length temp $ 2000;
 retain temp;
 temp = cats(temp,_infile_);
 temp = TRANWRD(temp,"'","¤");
 nb_var = countw(temp,'\','mq');
 if nb_var=42 then do;
  temp = TRANWRD(temp,"¤","'");
  put temp;
  call missing(temp);
 end;
run;
Ksharp
Super User

But I prefer to Perl Regular Expression.

 

data temp;
 infile "c:\temp\donnees.txt" dsd termstr=crlf length=len;
 input x $varying400. len;
 if prxmatch('/^\d{7}\\/',_infile_) then group+1;
run;

data temp1;
 length temp $ 2000;
do until(last.group);
 set temp;
 by group;
  temp=cats(temp,x); 
end;
drop x;
run;


data want;
 set temp1;
N_Code_NoIPP=scan(temp,1,'\','mq');
            C_Code_NoIPPExterne =scan(temp,2,'\','mq');
            C_Nom =scan(temp,3,'\','mq');
            C_Prenom =scan(temp,4,'\','mq');
            C_Telephone =scan(temp,5,'\','mq');
            C_Var1 =scan(temp,6,'\','mq');
            C_Var2 =scan(temp,7,'\','mq');
            ;
RUN;
Tom
Super User Tom
Super User

So looking at that example file we have some answers.

End of line is CR+LF.

Embedded line breaks are also CR+LF.

No values are quoted.

There are no embedded delimiters.

The number of delimiters per (intended) record is fixed at 6.  In addition it looks like there is an extra delimiter after the last value (which actually helps).

 

So you should be able to just COUNT the number of delimiters to determine whether you have need to remove a line break.

 

In this case the expected number of \ characters per line is 6.

So here is a program to convert the existing text file to a new file where the extra line breaks are removed.  And then read the new file to confirm it is working.

filename csv "c:\downloads\donnees.txt";
filename fixed temp;

data _null_;
  infile csv end=eof;
  file fixed ;
  do until(nslash>=6 or eof) ;
    input ;
    nslash=sum(nslash,countc(_infile_,'\'));
    put _infile_ @;
  end;
  put;
run;

data want;
  infile fixed dlm='\' truncover ;
  input (var1-var5) (:$30.);
run;

proc print;
run;

Results:

Obs     var1        var2       var3       var4           var5

 1     1234567    123456789    Name1    Surname1     01 02 03 04 05
 2     1234567    123456789    Name2    Surname2    01.02.03.04.05.
Tom
Super User Tom
Super User

To determine how hard it will be to fix the file so that SAS can parse it you need to answer some questions.

1) Are the fields that contain the end-of-line characters enclosed in quotes?

2) Are there a constant number of fields per line?

3) Does the delimiter character ever appear in the value of a field?

 

To look at the file you can use SAS data step and the LIST statement.

For example to look at the first 10 lines

data _null_;
  infile 'myfile' obs=10;
  input;
  list;
run;

To really see the end of line characters use RECFM=F on the infile statement.

So to look at the first 1,000 bytes of the file you could use:

data _null_;
  infile 'myfile' obs=10 recfm=f lrecl=100;
  input;
  list;
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 3189 views
  • 4 likes
  • 4 in conversation