BookmarkSubscribeRSS Feed
Sathya3
Obsidian | Level 7

I have to read a csv file and produce sas dataset

When I am trying to read it ,single quote is behaving similar to double quote and all the data in other fields  after single quote is read as one value. How can I fix it?

Please dont ask me why data in a field  has only single quote. It is some encrypted field. I cannot alter source data in any way

 

Sample code and data:

data test;

infile inf delimiter = '|' missover dsd lrecl=32767 firstobs=2;

followed by informat format and input statements

;

 

Input csv file has data like this |'|2023-12-12|30|40||||||DENG|

Now output dataset is taking all the above values in single field "crys_id_local" instead of multiple fields. sample output

crys_id_local

============

|2023-12-12|30|40||||||DENG|

 

 

 

10 REPLIES 10
rudfaden
Lapis Lazuli | Level 10

Think you need to specify som input fields.

data my_data;
    infile inf delimiter='|' missover dsd;
    input field1 : $10.
          date   : yymmdd10.
          value1 : 8.
          value2 : 8.
          field5 : $10.
          field6 : $10.
          field7 : $10.
          field8 : $10.
          field9 : $10.
          field10: $10.
          field11: $10.;
    format date yymmdd10.;
run;
Sathya3
Obsidian | Level 7

filed names are provided with each informat format and format statements

Kurt_Bremser
Super User

I see no issue here:

data want;
infile datalines dsd dlm="|" truncover;
input
  f1 :$10.
  f2 :$10.
  f3 :$10.
;
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|30
;

proc print data=want noobs;
run;

Result:

f1	f2	f3
a	b	c
 	 	 
 	d	e
'	f	g
h	'	i
 	'	2023-12-12

As you can see, the single quote is read as data and has no special meaning.

 

If you get different results, post examples of the csv file which causes this behavior, and the complete code you ran against it.

Sathya3
Obsidian | Level 7

Please add another value with single quote and try ..It adds as ending single quote and everything in between will be considerd as one column value.Please check last record in below example.I have modified  it 

 

data want;
infile datalines dsd dlm="|" truncover;
input
  f1 :$10.
  f2 :$10.
  f3 :$10.
;
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|'
;

proc print data=want noobs;
run;

 

 

Tom
Super User Tom
Super User

That is know behavior of DSD processing in SAS.

Not for the example you showed, since it only has one single quote on the line.  But if you had TWO single quotes. One at the START of a value and another later at the END of a value then the combination will be treated as one long quoted value. 

So let's add a second single quote to your example

|'|2023-12-12|30|40||||'||DENG|

Now you have 5 fields instead of 12 fields.

var1=''
var2='|2023-12-12|30|40||||'
var3=''
var4='DENG'
var5=''

You will need to either parse the string yourself.

Or run some preprocessing to convert the single quotes into quoted single quotes.  So if the string looked like this

|"'"|2023-12-12|30|40||||"'"||DENG|

Then SAS will read it as 12 fields with the second and ninth as being strings consisting of just a single quote.

So change your code to instead be:

data test;
   infile inf dsd dlm= '|' truncover lrecl=32767 firstobs=2;
   input @;
   _infile_ = tranwrd(_infile_,"|'|",'|"''"|');
...
run;

Note:  Most variables will not need to have either formats or informats attached to them.  So in general it is better to use a LENGTH statement to define the variables and only add FORMAT and/or INFORMAT statements for the variables (like DATE values) that will need them.

 

 

Sathya3
Obsidian | Level 7

Can you please help me place the _infile_ line. It is not working as expected in below code.

I have tried with csv file as well instead of datalines.It did not work

 

data want;
infile datalines dsd dlm="|" truncover;
input
f1 :$10.
f2 :$10.
f3 :$10.
;
_infile_ = tranwrd(_infile_,"|'|",'|"''"|');
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|'
;

proc print data=want noobs;
run;

Kurt_Bremser
Super User

Note that @Tom uses an "empty" INPUT with the "hold" modifier (@) to fill the _INFILE_ automatic variable; the actual INPUT has to come after the conversion:

data want;
infile datalines dsd dlm="|" truncover;
input @;
_infile_ = tranwrd(_infile_,"|'|",'|"''"|');
input
f1 :$10.
f2 :$10.
f3 :$10.
;
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|'
;

Note that, if those encrypted values need to stay as they were, you might have to deal with the consequences of the conversion later in your code.

Tom
Super User Tom
Super User

You need to first read in the next line by using an INPUT statement with a trailing @.

 

Also if your input lines could have the ' in the first or last position on the line then your replacement code needs to be a little more clever.  Either figure out how to use regular expressions, or just prepend and append extra | to the line (which you can later remove).  

data want;
  infile datalines dsd dlm="|" truncover;
  input @ ;
  _infile_ = substr(tranwrd(cats('|',_infile_,'|'),"|'|",'|"''"|'),2);
  length f1-f4 $10;
  input f1-f4 ;
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|'
;

Result

Obs    f1    f2    f3            f4

 1     a     b     c
 2
 3           d     e
 4     '     f     g
 5     h     '     i
 6           '     2023-12-12    '

You might also want to test what happens if you have multiple of those values in a row.  This logic should make sure that they all get replaced.

  input @ ;
  _infile_=cats('|',_infile_,'|');
  do while (index(_infile_,"|'|"));
    _infile_ = tranwrd(_infile_,"|'|",'|"''"|');
  end;
  _infile_=substr(_infile_,2);

 

Note that there is a limit of 32767 bytes for a character variable in SAS, including the _INFILE_ automatic variable used in this code.  So if the lines in the file are longer than that then you will need to find a different method to fix the file.

Tom
Super User Tom
Super User

You need to first read in the next line by using an INPUT statement with a trailing @.

 

Also if your input lines could have the ' in the first or last position on the line then your replacement code needs to be a little more clever.  Either figure out how to use regular expressions, or just prepend and append extra | to the line (which you can later remove).  

data want;
  infile datalines dsd dlm="|" truncover;
  input @ ;
  _infile_ = substr(tranwrd(cats('|',_infile_,'|'),"|'|",'|"''"|'),2);
  length f1-f4 $10;
  input f1-f4 ;
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|'
;

Result

Obs    f1    f2    f3            f4

 1     a     b     c
 2
 3           d     e
 4     '     f     g
 5     h     '     i
 6           '     2023-12-12    '

You might also want to test what happens if you have multiple of those values in a row.

  input @ ;
  _infile_=cats('|',_infile_,'|');
  do while (index(_infile_,"|'|"));
    _infile_ = tranwrd(_infile_,"|'|",'|"''"|');
  end;
  _infile_=substr(_infile_,2);

 

Ksharp
Super User

As Tom said ,it was supposed to be when using option DSD.

You need get rid of option DSD to avoid this behavior happen.

 

data want;
  infile datalines ;
  input;
length f1-f4 $ 40;
f1=scan(_infile_,1,'|','m');
f2=scan(_infile_,2,'|','m');
f3=scan(_infile_,3,'|','m');
f4=scan(_infile_,4,'|','m');
datalines;
a|b|c
|||
|d|e
'|f|g
h|'|i
|'|2023-12-12|'
;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 667 views
  • 4 likes
  • 5 in conversation