SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
manonlyn
Obsidian | Level 7

I'm trying to read a dataset into SAS using the below code:

 

 

   data REPAIRS.rep_sur2;
   infile 'C:\Name\data.csv' delimiter = "," FIRSTOBS = 2 LRECL=32;
     INPUT
      @1 Survey_Name $11.
      @12 Response_Date $10.
      @22 Response_Timestamp $10.;
Run;

But it doesn't seem to recognize that the commas are delimiters as it reads in as below:

Can anyone help me with why this isn't reading in properly?

 

Survey_Name           Response_Date         Response_Timestamp
p1864475747                ,07/10/201                     9,10 Jul 2
p1864475747                ,07/10/201                     9,10 Jul 2
p1864475747                ,07/10/201                     9,10 Jul 2

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Either use positions, or a delimiter; using both is redundant and causing problems, as you can see.

Use this instead:

data rep_sur2;
infile datalines delimiter = "," firstobs = 2;
input
  Survey_Name :$11.
  Response_Date :mmddyy10.
  Response_Timestamp :mmddyy10.
;
format
  Response_Date :mmddyy10.
  Response_Timestamp :mmddyy10.
;
datalines;
Survey_Name,Response_Date,Response_Timestamp
p1864475747,07/10/2019,07/10/2019
p1864475747,07/10/2019,07/10/2019
p1864475747,07/10/2019,07/10/2019
;

Note that date values must always be handled as such, using proper data type (numeric), and proper informats/formats.

View solution in original post

11 REPLIES 11
Kurt_Bremser
Super User

Either use positions, or a delimiter; using both is redundant and causing problems, as you can see.

Use this instead:

data rep_sur2;
infile datalines delimiter = "," firstobs = 2;
input
  Survey_Name :$11.
  Response_Date :mmddyy10.
  Response_Timestamp :mmddyy10.
;
format
  Response_Date :mmddyy10.
  Response_Timestamp :mmddyy10.
;
datalines;
Survey_Name,Response_Date,Response_Timestamp
p1864475747,07/10/2019,07/10/2019
p1864475747,07/10/2019,07/10/2019
p1864475747,07/10/2019,07/10/2019
;

Note that date values must always be handled as such, using proper data type (numeric), and proper informats/formats.

manonlyn
Obsidian | Level 7

Thanks this seems to have solved the issue. 

 

Although when I applied this to the whole dataset including 100+ other columns I found another issue. 

 

I have the below as part of a cell in a free text field

 

"Very polite engineer and not scared of having a laugh.
I like them to bring a coverup sheet to minimize the dirt when they do the work.
That’s what I used to do when I was healthy i used to work as a handyman.
The risen for this is that people like me disabled person I cannot clean after they live the place.
Overall great ."

 

This is separated into separate columns instead of all being within the Free Text column. 

 

Do you know how to correct this?

Kurt_Bremser
Super User

Is it separated into columns or rows?

A string that contains linefeeds (or CRLF if Windows) is always broken into lines there, and scrambles the data.

 

If, OTOH, you only suffer from having delimiters in your long string, and the long string is encapsulated in quotes (as it should be), then the dsd option in the infile statement will handle this.

manonlyn
Obsidian | Level 7

It's separating each sentence into a new column , but the amount of words from the sentence depends on the column length. 

 

I think this happens when ALT+ENTER has been used within Excel to write in a cell. I want these sentences to stay within one column if possible?

 

I have the DSD within the infile statement and I still have the same issue. 

manonlyn
Obsidian | Level 7

I'm going to open up in Excel and get rid of the line breaks like that.

 

Thanks for all your help!

ballardw
Super User

@manonlyn wrote:

I'm going to open up in Excel and get rid of the line breaks like that.

 

Thanks for all your help!


Be very careful doing such. Quite often Excel will CHANGE values of a CSV and they will not be correct when saved as CSV.

If you want to edit CSV files then a text editor, such as WORDPAD or similar, are much safer.

 

Values like "5-1" in a CSV sometimes get interpreted as DATES by Excel. Other oddities depend on cell content.

Tom
Super User Tom
Super User

Please show an example of the actual line from the CSV file that is causing the issue.

If the value has embedded line breaks then it will create multiple lines in the CSV file and hence multiple observations in the SAS dataset.

 

If you do have embedded line breaks and your are lucky enough that the CSV has actual end-of-lines marked with CRLF pairs and the embedded line breaks are just single LF or single CR then you can try adding the TERMSTR=CRLF option to your infile statement.  Otherwise you will need to preprocess the CSV file to fix those broken lines.  I have posted code to fix that issue many times on this forum.

manonlyn
Obsidian | Level 7

I've posted the actual cell that's caused the issue in the first reply. I think I might have to preprocess the CSV. Thanks for your time looking into this. 

novinosrin
Tourmaline | Level 20

Try reading with modified list input rather than formatted input

 

data REPAIRS.rep_sur2;
   infile 'C:\Name\data.csv' delimiter = "," FIRSTOBS = 2 LRECL=32;
     INPUT
       Survey_Name : $11.
      Response_Date : $10.
 Response_Timestamp : $10.;
Run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 4944 views
  • 1 like
  • 5 in conversation