- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a comma-delimited data file that includes names with apostrophes (i.e., single quotes). This causes multiple fields to be read incorrectly into a single field. For example, there have been rare instances in which the first name begins with an apostrophe and the last name ends with an apostrophe causing the first, middle, and last names to read in all together as the first name (e.g., 'LARRY,JOE,BIRD'). Is there an option that disallows single quotes to be used to enclose text with embedded commas but still allows double quotes to enclose text with embedded commas? Single quotes are allowed at data entry but double quotes are not.
Thanks!
data TEST_QUOTE_IMPORT;
infile "C:\TEMP\TEST_QUOTE.csv" delimiter =',' MISSOVER DSD lrecl=32767 firstobs=1 ;
informat ID $13. ;
informat ID_FLAG $1. ;
informat ID_ALT $13. ;
informat CHILD_BIRTH_STATE_4 $50. ;
informat SEQUENCE $6. ;
informat NAME_FIRST $60. ;
informat NAME_MIDDLE_7 $60. ;
informat NAME_LAST_8 $60. ;
input
ID $
ID_FLAG $
ID_ALT $
SEQUENCE $
NAME_FIRST_6 $
NAME_MIDDLE_7 $
NAME_LAST_8 $
;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try modifying the input line before reading it, this way:
data TEST_QUOTE_IMPORT;
infile "C:\TEMP\TEST_QUOTE.csv" delimiter =',' MISSOVER DSD
lrecl=32767 firstobs=n obs=m ;
informat ID $13. ;
informat ID_FLAG $1. ;
informat ID_ALT $13. ;
informat CHILD_BIRTH_STATE_4 $50. ;
informat SEQUENCE $6. ;
informat NAME_FIRST $60. ;
informat NAME_MIDDLE_7 $60. ;
informat NAME_LAST_8 $60. ;
input @;
_infile_ = prxchange("s/,('.+?),/,""\1"",/o", -1, _infile_);
_infile_ = prxchange("s/,(.+?'),/,""\1"",/o", -1, _infile_);
input
ID $
ID_FLAG $
ID_ALT $
SEQUENCE $
NAME_FIRST_6 $
NAME_MIDDLE_7 $
NAME_LAST_8 $
;
run;
where n is a couple of lines before the problem occurs and m is a few lines after, for testing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
DLM = "',"
Double quotes, with a single quote and comma inside it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Reeza. For this file, text with embedded commas are enclosed in double quotes, so single quotes can be ignored for this purpose.
I tried updating the DLM to "','" and removed the DSD option, and it solved the problem I described, but it doesn't import fields with embedded commas correctly anymore.
I'm trying to write import code for a community of users so I'm trying to keep things simple. Using the SCAN function or removing the apostrophes before import are options, but I'm hopeful another solution exists. I tried to keep things simple in the example, but the data file includes hundreds of fields.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you omit the DSD option, single and double quotes will have no special meaning. That may solve the problem you described but may also create other problems, as @Reeza said.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try modifying the input line before reading it, this way:
data TEST_QUOTE_IMPORT;
infile "C:\TEMP\TEST_QUOTE.csv" delimiter =',' MISSOVER DSD
lrecl=32767 firstobs=n obs=m ;
informat ID $13. ;
informat ID_FLAG $1. ;
informat ID_ALT $13. ;
informat CHILD_BIRTH_STATE_4 $50. ;
informat SEQUENCE $6. ;
informat NAME_FIRST $60. ;
informat NAME_MIDDLE_7 $60. ;
informat NAME_LAST_8 $60. ;
input @;
_infile_ = prxchange("s/,('.+?),/,""\1"",/o", -1, _infile_);
_infile_ = prxchange("s/,(.+?'),/,""\1"",/o", -1, _infile_);
input
ID $
ID_FLAG $
ID_ALT $
SEQUENCE $
NAME_FIRST_6 $
NAME_MIDDLE_7 $
NAME_LAST_8 $
;
run;
where n is a couple of lines before the problem occurs and m is a few lines after, for testing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks PGStats! That fixed the issue. Much appreciated.
These kinds of instances are rare, but I find it interesting that Excel opens the file correctly by default, but SAS doesn't, and doesn't include an option that would essentially mimic the Excel process.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You are welcome. Look at it this way: SAS is much better at reading Excel data files than Excel is at reading SAS data files.