BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ryanb2
Quartz | Level 8

I have a comma-delimited data file that includes names with apostrophes (i.e., single quotes).  This causes multiple fields to be read incorrectly into a single field.  For example, there have been rare instances in which the first name begins with an apostrophe and the last name ends with an apostrophe causing the first, middle, and last names to read in all together as the first name (e.g., 'LARRY,JOE,BIRD').  Is there an option that disallows single quotes to be used to enclose text with embedded commas but still allows double quotes to enclose text with embedded commas?  Single quotes are allowed at data entry but double quotes are not.

 

Thanks!

 

data TEST_QUOTE_IMPORT;
	infile "C:\TEMP\TEST_QUOTE.csv" delimiter =',' MISSOVER DSD lrecl=32767 firstobs=1 ;
	informat ID $13. ;
	informat ID_FLAG $1. ;
	informat ID_ALT $13. ;
	informat CHILD_BIRTH_STATE_4 $50. ;
	informat SEQUENCE $6. ;
	informat NAME_FIRST $60. ;
	informat NAME_MIDDLE_7 $60. ;
	informat NAME_LAST_8 $60. ;
		input
		ID $
		ID_FLAG $
		ID_ALT $
		SEQUENCE $
		NAME_FIRST_6 $
		NAME_MIDDLE_7 $
		NAME_LAST_8 $
		;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Try modifying the input line before reading it, this way:

 

data TEST_QUOTE_IMPORT;
	infile "C:\TEMP\TEST_QUOTE.csv" delimiter =',' MISSOVER DSD 
        lrecl=32767 firstobs=n obs=m ;
	informat ID $13. ;
	informat ID_FLAG $1. ;
	informat ID_ALT $13. ;
	informat CHILD_BIRTH_STATE_4 $50. ;
	informat SEQUENCE $6. ;
	informat NAME_FIRST $60. ;
	informat NAME_MIDDLE_7 $60. ;
	informat NAME_LAST_8 $60. ;
        input @;
        _infile_ = prxchange("s/,('.+?),/,""\1"",/o", -1, _infile_);
        _infile_ = prxchange("s/,(.+?'),/,""\1"",/o", -1, _infile_);

		input
		ID $
		ID_FLAG $
		ID_ALT $
		SEQUENCE $
		NAME_FIRST_6 $
		NAME_MIDDLE_7 $
		NAME_LAST_8 $
		;
run;

where n is a couple of lines before the problem occurs and m is a few lines after, for testing.

 

PG

View solution in original post

7 REPLIES 7
Reeza
Super User
Are you sure you want that? Usually the quotes are used to mask an embedded comma that SHOULD NOT be read separately? If this consistent, then I would consider reading it in and then parsing it separately with the SCAN() function. Or you can pass multiple characters to the DLM option.

DLM = "',"

Double quotes, with a single quote and comma inside it.
Ryanb2
Quartz | Level 8

Thanks Reeza.  For this file, text with embedded commas are enclosed in double quotes, so single quotes can be ignored for this purpose.  

 

I tried updating the DLM to "','" and removed the DSD option, and it solved the problem I described, but it doesn't import fields with embedded commas correctly anymore.

 

I'm trying to write import code for a community of users so I'm trying to keep things simple.  Using the SCAN function or removing the apostrophes before import are options, but I'm hopeful another solution exists.  I tried to keep things simple in the example, but the data file includes hundreds of fields.

Reeza
Super User
You didn't include any data. It may change things if you post a better sample or provide something that can be used to test.
PGStats
Opal | Level 21

If you omit the DSD option, single and double quotes will have no special meaning. That may solve the problem you described but may also create other problems, as @Reeza said.

PG
PGStats
Opal | Level 21

Try modifying the input line before reading it, this way:

 

data TEST_QUOTE_IMPORT;
	infile "C:\TEMP\TEST_QUOTE.csv" delimiter =',' MISSOVER DSD 
        lrecl=32767 firstobs=n obs=m ;
	informat ID $13. ;
	informat ID_FLAG $1. ;
	informat ID_ALT $13. ;
	informat CHILD_BIRTH_STATE_4 $50. ;
	informat SEQUENCE $6. ;
	informat NAME_FIRST $60. ;
	informat NAME_MIDDLE_7 $60. ;
	informat NAME_LAST_8 $60. ;
        input @;
        _infile_ = prxchange("s/,('.+?),/,""\1"",/o", -1, _infile_);
        _infile_ = prxchange("s/,(.+?'),/,""\1"",/o", -1, _infile_);

		input
		ID $
		ID_FLAG $
		ID_ALT $
		SEQUENCE $
		NAME_FIRST_6 $
		NAME_MIDDLE_7 $
		NAME_LAST_8 $
		;
run;

where n is a couple of lines before the problem occurs and m is a few lines after, for testing.

 

PG
Ryanb2
Quartz | Level 8

Thanks PGStats!  That fixed the issue.  Much appreciated.

 

These kinds of instances are rare, but I find it interesting that Excel opens the file correctly by default, but SAS doesn't, and doesn't include an option that would essentially mimic the Excel process.

PGStats
Opal | Level 21

You are welcome. Look at it this way: SAS is much better at reading Excel data files than Excel is at reading SAS data files. Smiley Happy

PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 5620 views
  • 2 likes
  • 3 in conversation