Hello.
We try to load a file into a SAS table so we use INFILE filename DLM='\' DSD
FILENAME ficin "path_to_file.txt";
DATA work.users;
INFILE ficin DLM='\' DSD;
INFORMAT Name $20. City $20. Birthdate DDMMYY10.;
FORMAT Name $20. City $20. Birthdate DDMMYY10.;
INPUT Name $ City $ Birthdate;
RUN;
But sometimes we can receive a file with a Carriage Return in the middle of the line :
Mikael\NANTES\01/01/2000 Guillaume\ NANTES\01/02/2000
Anne\\01/03/2000
And of course, we have to manage missing values so we need to keep the DSD option.
How can we load this file ?
We have try to remove the CR character using sed on our Linux server before reading the file without succes.
Regards
data _null_;
infile "c:\temp\donnees.txt" dsd termstr=crlf;
file "c:\temp\want.txt";
input ;
length temp $ 2000;
retain temp;
temp=cats(temp,_infile_);
if countw(temp,'\','mq')=7 then do;put temp;call missing(temp);end;
run;
data want;
infile "c:\temp\want.txt" DLM='\' dsd termstr=crlf;
length N_Code_NoIPP 8
C_Code_NoIPPExterne $12.
C_Nom $50.
C_Prenom $50.
C_Telephone $20.
C_Var1 $1.
C_Var2 $1.
;
input N_Code_NoIPP
C_Code_NoIPPExterne $
C_Nom $
C_Prenom $
C_Telephone $
C_Var1 $
C_Var2 $
;
RUN;
What is the end of line indicator for your data? LF , CRLF ?
In Unix the default SAS uses is LF so a CR in the data shouldn't mess-up mapping of data into SAS variables. The CR would get stored in a variable though which is not something you normally want.
1. Figure out the EoL (I'm normally using Notepad++ for that just opening the file while showing all characters).
2. If EoL anything other than LF then use infile option TERMSTR to define it - i.e. INFILE.... TERMSTR=CRLF;
3. To remove CR and other control characters use translate() or compress() or whatever is more suitable.
data want;
infile ..... termstr=CRLF truncover dsd dlm='\';
input @;
_infile_=compress(_infile_,,'c');
input <your vars>;
run;
Thanks for your replie.
The file use CRLF as end of line
If I use Notepad++ with Hexa editor, here is a valid line ending :
And here is a line with the CRLF in the middle of the line :
As the special character is a0, I have try to replace a00d0a with space or another character using sed command but I didn't succeed.
With TERMSTR=CRLF, only the first line is read.
As all lines are ending with \+CRLF (so 5c 0d 0a), other occurence of 0d 0a should be removed.
It is complicated question. Better post your real data to demonstrate your Q.
data _null_;
infile "c:\temp\path_to_file.txt" dsd termstr=crlf;
file "c:\temp\want.txt";
input ;
length temp $ 2000;
retain temp;
temp=cats(temp,_infile_);
if countw(temp,'\','mq')=3 then do;put temp;call missing(temp);end;
run;
data want;
infile "c:\temp\want.txt" DLM='\' dsd termstr=crlf;
INFORMAT Name $20. City $20. Birthdate DDMMYY10.;
FORMAT Name $20. City $20. Birthdate DDMMYY10.;
INPUT Name $ City $ Birthdate;
RUN;
It is complicated question. Better post your real data to demonstrate your Q.
data _null_;
infile "c:\temp\path_to_file.txt" dsd termstr=crlf;
file "c:\temp\want.txt";
input ;
length temp $ 2000;
retain temp;
temp=cats(temp,_infile_);
if countw(temp,'\','mq')=3 then do;put temp;call missing(temp);end;
run;
data want;
infile "c:\temp\want.txt" DLM='\' dsd termstr=crlf;
INFORMAT Name $20. City $20. Birthdate DDMMYY10.;
FORMAT Name $20. City $20. Birthdate DDMMYY10.;
INPUT Name $ City $ Birthdate;
RUN;
I will post a small part of the file but I have to anonymise it before 😉
Here is a sample of the file and the SAS programm.
filename ficin "C:\temp\donnees.txt" LRECL=1024;
data donnees;
infile ficin missover dsd dlm='\';
length C_Code_NoIPPExterne $12.
C_Nom $50.
C_Prenom $50.
C_Telephone $20.
C_Var1 $1.
C_Var2 $1.
;
input N_Code_NoIPP
C_Code_NoIPPExterne $
C_Nom $
C_Prenom $
C_Telephone $
C_Var1 $
C_Var2 $
;
run;
data _null_;
infile "c:\temp\donnees.txt" dsd termstr=crlf;
file "c:\temp\want.txt";
input ;
length temp $ 2000;
retain temp;
temp=cats(temp,_infile_);
if countw(temp,'\','mq')=7 then do;put temp;call missing(temp);end;
run;
data want;
infile "c:\temp\want.txt" DLM='\' dsd termstr=crlf;
length N_Code_NoIPP 8
C_Code_NoIPPExterne $12.
C_Nom $50.
C_Prenom $50.
C_Telephone $20.
C_Var1 $1.
C_Var2 $1.
;
input N_Code_NoIPP
C_Code_NoIPPExterne $
C_Nom $
C_Prenom $
C_Telephone $
C_Var1 $
C_Var2 $
;
RUN;
Thank you for your code.
My original file have more variables by lines and I have to deal with ' used by users in some fields but now I can load the file 😉
Here is the programm I use :
data temp;
infile "/ECHANGE/PRQ/in/archive/CLI_20210501/202105010051DFD_MVTS_PATIENTS.NDW" dsd missover;
file "/ECHANGE/PRQ/in/archive/CLI_20210501/FD_MVTS_PATIENTS.NDW";
input ;
length temp $ 2000;
retain temp;
temp = cats(temp,_infile_);
temp = TRANWRD(temp,"'","¤");
nb_var = countw(temp,'\','mq');
if nb_var=42 then do;
temp = TRANWRD(temp,"¤","'");
put temp;
call missing(temp);
end;
run;
But I prefer to Perl Regular Expression.
data temp; infile "c:\temp\donnees.txt" dsd termstr=crlf length=len; input x $varying400. len; if prxmatch('/^\d{7}\\/',_infile_) then group+1; run; data temp1; length temp $ 2000; do until(last.group); set temp; by group; temp=cats(temp,x); end; drop x; run; data want; set temp1; N_Code_NoIPP=scan(temp,1,'\','mq'); C_Code_NoIPPExterne =scan(temp,2,'\','mq'); C_Nom =scan(temp,3,'\','mq'); C_Prenom =scan(temp,4,'\','mq'); C_Telephone =scan(temp,5,'\','mq'); C_Var1 =scan(temp,6,'\','mq'); C_Var2 =scan(temp,7,'\','mq'); ; RUN;
So looking at that example file we have some answers.
End of line is CR+LF.
Embedded line breaks are also CR+LF.
No values are quoted.
There are no embedded delimiters.
The number of delimiters per (intended) record is fixed at 6. In addition it looks like there is an extra delimiter after the last value (which actually helps).
So you should be able to just COUNT the number of delimiters to determine whether you have need to remove a line break.
In this case the expected number of \ characters per line is 6.
So here is a program to convert the existing text file to a new file where the extra line breaks are removed. And then read the new file to confirm it is working.
filename csv "c:\downloads\donnees.txt";
filename fixed temp;
data _null_;
infile csv end=eof;
file fixed ;
do until(nslash>=6 or eof) ;
input ;
nslash=sum(nslash,countc(_infile_,'\'));
put _infile_ @;
end;
put;
run;
data want;
infile fixed dlm='\' truncover ;
input (var1-var5) (:$30.);
run;
proc print;
run;
Results:
Obs var1 var2 var3 var4 var5 1 1234567 123456789 Name1 Surname1 01 02 03 04 05 2 1234567 123456789 Name2 Surname2 01.02.03.04.05.
To determine how hard it will be to fix the file so that SAS can parse it you need to answer some questions.
1) Are the fields that contain the end-of-line characters enclosed in quotes?
2) Are there a constant number of fields per line?
3) Does the delimiter character ever appear in the value of a field?
To look at the file you can use SAS data step and the LIST statement.
For example to look at the first 10 lines
data _null_;
infile 'myfile' obs=10;
input;
list;
run;
To really see the end of line characters use RECFM=F on the infile statement.
So to look at the first 1,000 bytes of the file you could use:
data _null_;
infile 'myfile' obs=10 recfm=f lrecl=100;
input;
list;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.