- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to read a dataset into SAS using the below code:
data REPAIRS.rep_sur2;
infile 'C:\Name\data.csv' delimiter = "," FIRSTOBS = 2 LRECL=32;
INPUT
@1 Survey_Name $11.
@12 Response_Date $10.
@22 Response_Timestamp $10.;
Run;
But it doesn't seem to recognize that the commas are delimiters as it reads in as below:
Can anyone help me with why this isn't reading in properly?
Survey_Name Response_Date Response_Timestamp
p1864475747 ,07/10/201 9,10 Jul 2
p1864475747 ,07/10/201 9,10 Jul 2
p1864475747 ,07/10/201 9,10 Jul 2
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Either use positions, or a delimiter; using both is redundant and causing problems, as you can see.
Use this instead:
data rep_sur2;
infile datalines delimiter = "," firstobs = 2;
input
Survey_Name :$11.
Response_Date :mmddyy10.
Response_Timestamp :mmddyy10.
;
format
Response_Date :mmddyy10.
Response_Timestamp :mmddyy10.
;
datalines;
Survey_Name,Response_Date,Response_Timestamp
p1864475747,07/10/2019,07/10/2019
p1864475747,07/10/2019,07/10/2019
p1864475747,07/10/2019,07/10/2019
;
Note that date values must always be handled as such, using proper data type (numeric), and proper informats/formats.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Either use positions, or a delimiter; using both is redundant and causing problems, as you can see.
Use this instead:
data rep_sur2;
infile datalines delimiter = "," firstobs = 2;
input
Survey_Name :$11.
Response_Date :mmddyy10.
Response_Timestamp :mmddyy10.
;
format
Response_Date :mmddyy10.
Response_Timestamp :mmddyy10.
;
datalines;
Survey_Name,Response_Date,Response_Timestamp
p1864475747,07/10/2019,07/10/2019
p1864475747,07/10/2019,07/10/2019
p1864475747,07/10/2019,07/10/2019
;
Note that date values must always be handled as such, using proper data type (numeric), and proper informats/formats.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks this seems to have solved the issue.
Although when I applied this to the whole dataset including 100+ other columns I found another issue.
I have the below as part of a cell in a free text field
"Very polite engineer and not scared of having a laugh.
I like them to bring a coverup sheet to minimize the dirt when they do the work.
That’s what I used to do when I was healthy i used to work as a handyman.
The risen for this is that people like me disabled person I cannot clean after they live the place.
Overall great ."
This is separated into separate columns instead of all being within the Free Text column.
Do you know how to correct this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is it separated into columns or rows?
A string that contains linefeeds (or CRLF if Windows) is always broken into lines there, and scrambles the data.
If, OTOH, you only suffer from having delimiters in your long string, and the long string is encapsulated in quotes (as it should be), then the dsd option in the infile statement will handle this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's separating each sentence into a new column , but the amount of words from the sentence depends on the column length.
I think this happens when ALT+ENTER has been used within Excel to write in a cell. I want these sentences to stay within one column if possible?
I have the DSD within the infile statement and I still have the same issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm going to open up in Excel and get rid of the line breaks like that.
Thanks for all your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@manonlyn wrote:
I'm going to open up in Excel and get rid of the line breaks like that.
Thanks for all your help!
Be very careful doing such. Quite often Excel will CHANGE values of a CSV and they will not be correct when saved as CSV.
If you want to edit CSV files then a text editor, such as WORDPAD or similar, are much safer.
Values like "5-1" in a CSV sometimes get interpreted as DATES by Excel. Other oddities depend on cell content.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please go up vote my enhancement request Ballot item.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please show an example of the actual line from the CSV file that is causing the issue.
If the value has embedded line breaks then it will create multiple lines in the CSV file and hence multiple observations in the SAS dataset.
If you do have embedded line breaks and your are lucky enough that the CSV has actual end-of-lines marked with CRLF pairs and the embedded line breaks are just single LF or single CR then you can try adding the TERMSTR=CRLF option to your infile statement. Otherwise you will need to preprocess the CSV file to fix those broken lines. I have posted code to fix that issue many times on this forum.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I've posted the actual cell that's caused the issue in the first reply. I think I might have to preprocess the CSV. Thanks for your time looking into this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try reading with modified list input rather than formatted input
data REPAIRS.rep_sur2;
infile 'C:\Name\data.csv' delimiter = "," FIRSTOBS = 2 LRECL=32;
INPUT
Survey_Name : $11.
Response_Date : $10.
Response_Timestamp : $10.;
Run;