BookmarkSubscribeRSS Feed
christinagting0
Quartz | Level 8

I have noticed that for variables that have free form text in them, commas tend to screw up the entire dataset!

 

I'm curious to know how other SAS users handle commas! What is best practice for this if your dataset has numerical variables and then some free form text.

 

I normally use a CSV to import my data and b/c it's comma delimited when there are commas this tends to screw a lot of things up!

 

Any advice?

10 REPLIES 10
ballardw
Super User

You wil need to provide more information. What type of file is showing these commas? Was it data entered into a data base, spreadsheet, plain text document or something else.  Can you post some example lines (an attachment text file or post into a code box, the main forum windows convert things and lose stuff and introduce issues.

What is the problem that the commas are causing? Can you not read the data into SAS? It messes up reports? Something else?

 

And wait till you get data that people enter carriage returns and/or line feeds or intermingled DBCS text.

Reeza
Super User

Survey data comments. And they can also include quotes. Ugh! 

Or carriage returns, or a host of other issues. 

 

I will actually remove from my dataset and import them in separately from a single file. 

CSV doesn't usually work, so I use Excel in those cases. 

PGStats
Opal | Level 21

If you can export your data in TAB separated values, you will save yourself a lot of trouble. You would then specify the delimiter as DLM='09'x in your INFILE statement.

 

Normally CSV fields including commas or spaces are surrounded with quotes, with internal quotes represented as doubled quotes ("My name is ""PG"" "). The DSD (for Data Sensitive Delimiter) option in the INFILE statement handles those properly.

 

Give us a bad example to work on...

PG
Patrick
Opal | Level 21

When creating a text file then I normally tend to use a character as delimiter which doesn't exist in the data, eg. a pipe character.

 

If the data can contain "anything" then I'd use a string of characters as delimiter, eg. something like @|@

 

You can read such data with SAS using dlmstr='@|@'

Tom
Super User Tom
Super User

A properly created CSV file has no problem with values that contain commas. That is because those values will have quotes around them.

NAME,GENDER,AGE
"Smith, Joe",M,15
"Jones, Sally",F,21

SAS will read that very easily:

data want ;
   infile 'myfile.csv' dsd truncover firstobs=2;
   length name $20 gender $1 age 8 ;
   input name gender age;
run;

What software are you using to create the CSV file that does not know that it must add the quotes?

 

You can use another character as your delimiter (although it would no longer be a Comma Separated Values file).  

Try to pick something that is not likely to appear in your data.  A common one to use is the pipe (|) character. 

 

data want ;
   infile cards dsd dlm='|' truncover firstobs=2;
   length name $20 gender $1 age 8 ;
   input name gender age;
cards;
NAME|GENDER|AGE
Smith, Joe|M|15
Jones, Sally|F|21
;

 

Reeza
Super User

This is common in survey applications with comments fields, such as Survey Monkey, Opinio, or Qualtrics. 

 

And it's not just comma's, in my experience, people can include quotes as well as line breaks somehow. 

Comment fields are killers!

ballardw
Super User

@Reeza wrote:

This is common in survey applications with comments fields, such as Survey Monkey, Opinio, or Qualtrics. 

 

And it's not just comma's, in my experience, people can include quotes as well as line breaks somehow. 

Comment fields are killers!


Agreed!

 

I suspect that the OP will confirm that the issue is commas and quotes within the same field.

 

Depending upon the number of records involved and availability to the source program, I've been known to go back to the original program and edit the data there as the time involved for trying to code around pathalogic data can be much longer than identify the problem records and edit in the original.

Reeza
Super User

Honestly, most of the time, the comments are not analyzed via SAS so excluding them is usually valid.

 

I would hope, and assume, that SAS Text Analytics had better ways of working around this issue.

christinagting0
Quartz | Level 8

@ballardw you would be correct in assuming that my data contains commas and " in the same fields. It's annoying to work with especially as a beginner SAS user.

 

I like the idea of separating out the comment variables, but I'm worries that this will get very messy with a larger dataset.

Patrick
Opal | Level 21

@christinagting0

Do you want/need this free text data for futher analysis or could you just skip it?

 

There are always ways to read such free text fields. What works will depend how your source data really looks like. Post an example if you're after actual code.

 

The easiest case is, if your free text field is at the very end of a record as then the code could be as simple as below.

data test;
  infile datalines4 dlm=',' truncover;
  input (varA varB varC) ($) FreeText $200.;
  datalines4;
aa,bb,cc,free text, can be anything, aa's,|[]OOP!@#$%^&*()_
;;;;
run;

Above works because once VarC is mapped, the input pointer is at the position+1 of the comma after the data for VarC.

 

The format $200. then instructs SAS to just read the next 200 characters - as a Format and not an Informat gets used, the commas are no more treated as field delimiters. 

TRUNCOVER in the infile statement ensures that SAS also maps data against FreeText if the remaining input string is shorter than 200 characters.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 4780 views
  • 6 likes
  • 6 in conversation