I have a comma-delimited data file that includes names with apostrophes (i.e., single quotes). This causes multiple fields to be read incorrectly into a single field. For example, there have been rare instances in which the first name begins with an apostrophe and the last name ends with an apostrophe causing the first, middle, and last names to read in all together as the first name (e.g., 'LARRY,JOE,BIRD'). Is there an option that disallows single quotes to be used to enclose text with embedded commas but still allows double quotes to enclose text with embedded commas? Single quotes are allowed at data entry but double quotes are not.
Thanks!
data TEST_QUOTE_IMPORT;
infile "C:\TEMP\TEST_QUOTE.csv" delimiter =',' MISSOVER DSD lrecl=32767 firstobs=1 ;
informat ID $13. ;
informat ID_FLAG $1. ;
informat ID_ALT $13. ;
informat CHILD_BIRTH_STATE_4 $50. ;
informat SEQUENCE $6. ;
informat NAME_FIRST $60. ;
informat NAME_MIDDLE_7 $60. ;
informat NAME_LAST_8 $60. ;
input
ID $
ID_FLAG $
ID_ALT $
SEQUENCE $
NAME_FIRST_6 $
NAME_MIDDLE_7 $
NAME_LAST_8 $
;
run;
Try modifying the input line before reading it, this way:
data TEST_QUOTE_IMPORT;
infile "C:\TEMP\TEST_QUOTE.csv" delimiter =',' MISSOVER DSD
lrecl=32767 firstobs=n obs=m ;
informat ID $13. ;
informat ID_FLAG $1. ;
informat ID_ALT $13. ;
informat CHILD_BIRTH_STATE_4 $50. ;
informat SEQUENCE $6. ;
informat NAME_FIRST $60. ;
informat NAME_MIDDLE_7 $60. ;
informat NAME_LAST_8 $60. ;
input @;
_infile_ = prxchange("s/,('.+?),/,""\1"",/o", -1, _infile_);
_infile_ = prxchange("s/,(.+?'),/,""\1"",/o", -1, _infile_);
input
ID $
ID_FLAG $
ID_ALT $
SEQUENCE $
NAME_FIRST_6 $
NAME_MIDDLE_7 $
NAME_LAST_8 $
;
run;
where n is a couple of lines before the problem occurs and m is a few lines after, for testing.
Thanks Reeza. For this file, text with embedded commas are enclosed in double quotes, so single quotes can be ignored for this purpose.
I tried updating the DLM to "','" and removed the DSD option, and it solved the problem I described, but it doesn't import fields with embedded commas correctly anymore.
I'm trying to write import code for a community of users so I'm trying to keep things simple. Using the SCAN function or removing the apostrophes before import are options, but I'm hopeful another solution exists. I tried to keep things simple in the example, but the data file includes hundreds of fields.
If you omit the DSD option, single and double quotes will have no special meaning. That may solve the problem you described but may also create other problems, as @Reeza said.
Try modifying the input line before reading it, this way:
data TEST_QUOTE_IMPORT;
infile "C:\TEMP\TEST_QUOTE.csv" delimiter =',' MISSOVER DSD
lrecl=32767 firstobs=n obs=m ;
informat ID $13. ;
informat ID_FLAG $1. ;
informat ID_ALT $13. ;
informat CHILD_BIRTH_STATE_4 $50. ;
informat SEQUENCE $6. ;
informat NAME_FIRST $60. ;
informat NAME_MIDDLE_7 $60. ;
informat NAME_LAST_8 $60. ;
input @;
_infile_ = prxchange("s/,('.+?),/,""\1"",/o", -1, _infile_);
_infile_ = prxchange("s/,(.+?'),/,""\1"",/o", -1, _infile_);
input
ID $
ID_FLAG $
ID_ALT $
SEQUENCE $
NAME_FIRST_6 $
NAME_MIDDLE_7 $
NAME_LAST_8 $
;
run;
where n is a couple of lines before the problem occurs and m is a few lines after, for testing.
Thanks PGStats! That fixed the issue. Much appreciated.
These kinds of instances are rare, but I find it interesting that Excel opens the file correctly by default, but SAS doesn't, and doesn't include an option that would essentially mimic the Excel process.
You are welcome. Look at it this way: SAS is much better at reading Excel data files than Excel is at reading SAS data files.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.