I have to read a csv file and produce sas dataset
When I am trying to read it ,single quote is behaving similar to double quote and all the data in other fields after single quote is read as one value. How can I fix it?
Please dont ask me why data in a field has only single quote. It is some encrypted field. I cannot alter source data in any way
Sample code and data:
data test;
infile inf delimiter = '|' missover dsd lrecl=32767 firstobs=2;
followed by informat format and input statements
;
Input csv file has data like this |'|2023-12-12|30|40||||||DENG|
Now output dataset is taking all the above values in single field "crys_id_local" instead of multiple fields. sample output
crys_id_local
============
|2023-12-12|30|40||||||DENG|
Think you need to specify som input fields.
data my_data;
infile inf delimiter='|' missover dsd;
input field1 : $10.
date : yymmdd10.
value1 : 8.
value2 : 8.
field5 : $10.
field6 : $10.
field7 : $10.
field8 : $10.
field9 : $10.
field10: $10.
field11: $10.;
format date yymmdd10.;
run;
filed names are provided with each informat format and format statements
I see no issue here:
data want;
infile datalines dsd dlm="|" truncover;
input
f1 :$10.
f2 :$10.
f3 :$10.
;
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|30
;
proc print data=want noobs;
run;
Result:
f1 f2 f3 a b c d e ' f g h ' i ' 2023-12-12
As you can see, the single quote is read as data and has no special meaning.
If you get different results, post examples of the csv file which causes this behavior, and the complete code you ran against it.
Please add another value with single quote and try ..It adds as ending single quote and everything in between will be considerd as one column value.Please check last record in below example.I have modified it
data want;
infile datalines dsd dlm="|" truncover;
input
f1 :$10.
f2 :$10.
f3 :$10.
;
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|'
;
proc print data=want noobs;
run;
That is know behavior of DSD processing in SAS.
Not for the example you showed, since it only has one single quote on the line. But if you had TWO single quotes. One at the START of a value and another later at the END of a value then the combination will be treated as one long quoted value.
So let's add a second single quote to your example
|'|2023-12-12|30|40||||'||DENG|
Now you have 5 fields instead of 12 fields.
var1=''
var2='|2023-12-12|30|40||||'
var3=''
var4='DENG'
var5=''
You will need to either parse the string yourself.
Or run some preprocessing to convert the single quotes into quoted single quotes. So if the string looked like this
|"'"|2023-12-12|30|40||||"'"||DENG|
Then SAS will read it as 12 fields with the second and ninth as being strings consisting of just a single quote.
So change your code to instead be:
data test;
infile inf dsd dlm= '|' truncover lrecl=32767 firstobs=2;
input @;
_infile_ = tranwrd(_infile_,"|'|",'|"''"|');
...
run;
Note: Most variables will not need to have either formats or informats attached to them. So in general it is better to use a LENGTH statement to define the variables and only add FORMAT and/or INFORMAT statements for the variables (like DATE values) that will need them.
Can you please help me place the _infile_ line. It is not working as expected in below code.
I have tried with csv file as well instead of datalines.It did not work
data want;
infile datalines dsd dlm="|" truncover;
input
f1 :$10.
f2 :$10.
f3 :$10.
;
_infile_ = tranwrd(_infile_,"|'|",'|"''"|');
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|'
;
proc print data=want noobs;
run;
Note that @Tom uses an "empty" INPUT with the "hold" modifier (@) to fill the _INFILE_ automatic variable; the actual INPUT has to come after the conversion:
data want;
infile datalines dsd dlm="|" truncover;
input @;
_infile_ = tranwrd(_infile_,"|'|",'|"''"|');
input
f1 :$10.
f2 :$10.
f3 :$10.
;
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|'
;
Note that, if those encrypted values need to stay as they were, you might have to deal with the consequences of the conversion later in your code.
You need to first read in the next line by using an INPUT statement with a trailing @.
Also if your input lines could have the ' in the first or last position on the line then your replacement code needs to be a little more clever. Either figure out how to use regular expressions, or just prepend and append extra | to the line (which you can later remove).
data want;
infile datalines dsd dlm="|" truncover;
input @ ;
_infile_ = substr(tranwrd(cats('|',_infile_,'|'),"|'|",'|"''"|'),2);
length f1-f4 $10;
input f1-f4 ;
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|'
;
Result
Obs f1 f2 f3 f4 1 a b c 2 3 d e 4 ' f g 5 h ' i 6 ' 2023-12-12 '
You might also want to test what happens if you have multiple of those values in a row. This logic should make sure that they all get replaced.
input @ ;
_infile_=cats('|',_infile_,'|');
do while (index(_infile_,"|'|"));
_infile_ = tranwrd(_infile_,"|'|",'|"''"|');
end;
_infile_=substr(_infile_,2);
Note that there is a limit of 32767 bytes for a character variable in SAS, including the _INFILE_ automatic variable used in this code. So if the lines in the file are longer than that then you will need to find a different method to fix the file.
You need to first read in the next line by using an INPUT statement with a trailing @.
Also if your input lines could have the ' in the first or last position on the line then your replacement code needs to be a little more clever. Either figure out how to use regular expressions, or just prepend and append extra | to the line (which you can later remove).
data want;
infile datalines dsd dlm="|" truncover;
input @ ;
_infile_ = substr(tranwrd(cats('|',_infile_,'|'),"|'|",'|"''"|'),2);
length f1-f4 $10;
input f1-f4 ;
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|'
;
Result
Obs f1 f2 f3 f4 1 a b c 2 3 d e 4 ' f g 5 h ' i 6 ' 2023-12-12 '
You might also want to test what happens if you have multiple of those values in a row.
input @ ;
_infile_=cats('|',_infile_,'|');
do while (index(_infile_,"|'|"));
_infile_ = tranwrd(_infile_,"|'|",'|"''"|');
end;
_infile_=substr(_infile_,2);
As Tom said ,it was supposed to be when using option DSD.
You need get rid of option DSD to avoid this behavior happen.
data want;
infile datalines ;
input;
length f1-f4 $ 40;
f1=scan(_infile_,1,'|','m');
f2=scan(_infile_,2,'|','m');
f3=scan(_infile_,3,'|','m');
f4=scan(_infile_,4,'|','m');
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|'
;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.