I am trying to import a large CSV file into SAS. I tried to use both
proc import datafile = "rawdata.csv" dbms = csv out = mydata replace; getnames = yes; run;
and
data mydata;
infile "rawdata.csv" delimiter = ',' dsd missover lrecl = 32767 firstobs = 2; ...; run;
These two commands appear to generate the same dataset. However, only the second command leads to error messages in log. The messages are as follows:
To investigate the issue, I open the raw file in MS Access and go to row 2322547. The issue seems to be that a character variable has weird values:
And in the imported dataset, this value becomes
I tried to use encoding='utf-8' but it does not help. How can I correctly import this dataset?
s
Most likely you have poorly formed file. Perhaps it has some embedded line breaks in the middle of one the fields? This can cause the INPUT statement, either the one generated by PROC IMPORT or the one you wrote for your own data step, to get out of alignment with the values on the lines.
If that is your problem it is a common problem and has been answered on this forum many times. You might be able to read it if the line breaks in the fields just use single carriage return or single linefeed and the actual real end of line codes in the file use both a carriage return and a linefeed. If not then you need to pre-process the file to fix the issue by removing the extra line breaks.
PS Why did you paste photographs of text? You can just copy and paste the lines of text from the SAS log.
Thank, Tom. I found that the issue is due to hidden linebreakers in the string.
Shame SAS decided not to address the issue.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.