How does everyone deal with commas in SAS?

Reply
Frequent Contributor
Posts: 88

How does everyone deal with commas in SAS?

I have noticed that for variables that have free form text in them, commas tend to screw up the entire dataset!

 

I'm curious to know how other SAS users handle commas! What is best practice for this if your dataset has numerical variables and then some free form text.

 

I normally use a CSV to import my data and b/c it's comma delimited when there are commas this tends to screw a lot of things up!

 

Any advice?

Super User
Posts: 11,134

Re: How does everyone deal with commas in SAS?

You wil need to provide more information. What type of file is showing these commas? Was it data entered into a data base, spreadsheet, plain text document or something else.  Can you post some example lines (an attachment text file or post into a code box, the main forum windows convert things and lose stuff and introduce issues.

What is the problem that the commas are causing? Can you not read the data into SAS? It messes up reports? Something else?

 

And wait till you get data that people enter carriage returns and/or line feeds or intermingled DBCS text.

Super User
Posts: 19,157

Re: How does everyone deal with commas in SAS?

Survey data comments. And they can also include quotes. Ugh! 

Or carriage returns, or a host of other issues. 

 

I will actually remove from my dataset and import them in separately from a single file. 

CSV doesn't usually work, so I use Excel in those cases. 

Respected Advisor
Posts: 4,817

Re: How does everyone deal with commas in SAS?

If you can export your data in TAB separated values, you will save yourself a lot of trouble. You would then specify the delimiter as DLM='09'x in your INFILE statement.

 

Normally CSV fields including commas or spaces are surrounded with quotes, with internal quotes represented as doubled quotes ("My name is ""PG"" "). The DSD (for Data Sensitive Delimiter) option in the INFILE statement handles those properly.

 

Give us a bad example to work on...

PG
Respected Advisor
Posts: 4,135

Re: How does everyone deal with commas in SAS?

[ Edited ]

When creating a text file then I normally tend to use a character as delimiter which doesn't exist in the data, eg. a pipe character.

 

If the data can contain "anything" then I'd use a string of characters as delimiter, eg. something like @|@

 

You can read such data with SAS using dlmstr='@|@'

Super User
Super User
Posts: 6,845

Re: How does everyone deal with commas in SAS?

[ Edited ]

A properly created CSV file has no problem with values that contain commas. That is because those values will have quotes around them.

NAME,GENDER,AGE
"Smith, Joe",M,15
"Jones, Sally",F,21

SAS will read that very easily:

data want ;
   infile 'myfile.csv' dsd truncover firstobs=2;
   length name $20 gender $1 age 8 ;
   input name gender age;
run;

What software are you using to create the CSV file that does not know that it must add the quotes?

 

You can use another character as your delimiter (although it would no longer be a Comma Separated Values file).  

Try to pick something that is not likely to appear in your data.  A common one to use is the pipe (|) character. 

 

data want ;
   infile cards dsd dlm='|' truncover firstobs=2;
   length name $20 gender $1 age 8 ;
   input name gender age;
cards;
NAME|GENDER|AGE
Smith, Joe|M|15
Jones, Sally|F|21
;

 

Super User
Posts: 19,157

Re: How does everyone deal with commas in SAS?

This is common in survey applications with comments fields, such as Survey Monkey, Opinio, or Qualtrics. 

 

And it's not just comma's, in my experience, people can include quotes as well as line breaks somehow. 

Comment fields are killers!

Super User
Posts: 11,134

Re: How does everyone deal with commas in SAS?


Reeza wrote:

This is common in survey applications with comments fields, such as Survey Monkey, Opinio, or Qualtrics. 

 

And it's not just comma's, in my experience, people can include quotes as well as line breaks somehow. 

Comment fields are killers!


Agreed!

 

I suspect that the OP will confirm that the issue is commas and quotes within the same field.

 

Depending upon the number of records involved and availability to the source program, I've been known to go back to the original program and edit the data there as the time involved for trying to code around pathalogic data can be much longer than identify the problem records and edit in the original.

Super User
Posts: 19,157

Re: How does everyone deal with commas in SAS?

Honestly, most of the time, the comments are not analyzed via SAS so excluding them is usually valid.

 

I would hope, and assume, that SAS Text Analytics had better ways of working around this issue.

Frequent Contributor
Posts: 88

Re: How does everyone deal with commas in SAS?

@ballardw you would be correct in assuming that my data contains commas and " in the same fields. It's annoying to work with especially as a beginner SAS user.

 

I like the idea of separating out the comment variables, but I'm worries that this will get very messy with a larger dataset.

Respected Advisor
Posts: 4,135

Re: How does everyone deal with commas in SAS?

[ Edited ]

@christinagting0

Do you want/need this free text data for futher analysis or could you just skip it?

 

There are always ways to read such free text fields. What works will depend how your source data really looks like. Post an example if you're after actual code.

 

The easiest case is, if your free text field is at the very end of a record as then the code could be as simple as below.

data test;
  infile datalines4 dlm=',' truncover;
  input (varA varB varC) ($) FreeText $200.;
  datalines4;
aa,bb,cc,free text, can be anything, aa's,|[]OOP!@#$%^&*()_
;;;;
run;

Above works because once VarC is mapped, the input pointer is at the position+1 of the comma after the data for VarC.

 

The format $200. then instructs SAS to just read the next 200 characters - as a Format and not an Informat gets used, the commas are no more treated as field delimiters. 

TRUNCOVER in the infile statement ensures that SAS also maps data against FreeText if the remaining input string is shorter than 200 characters.

 

Ask a Question
Discussion stats
  • 10 replies
  • 387 views
  • 6 likes
  • 6 in conversation