DATA Step, Macro, Functions and more

Reading UTF-8 Encoded csv File

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 109
Accepted Solution

Reading UTF-8 Encoded csv File

[ Edited ]

Hi All,

 

I am trying to read a utf-8 encoded csv file.

In Notepad++ the file looks fine.

In notepad the whole file comes as 1 line.

I am using a UNIX OS, where SAS 9.4 is installed.

 

So when SAS reads the file, it is only reading 1 record. As I am using firstobs=2 to avoid reading header, the final dataset contains 0 records.

 

I am using the following code.

 

filename file1 '/u04/dataloader/Satish/App/ZZ/f1.csv' encoding='utf-8';

data xxx.asdf;
infile file1 dsd dlm=',' firstobs=2 encoding='utf-8';
input SiteNumber:$10. LabName:$30. AnalyteName:$16. FromDate:$20. ToDate:$20. 
      LowRange:best12. HighRange:best12. Units:$10. Dictionary:$10. Comments$10.;
run;

I am trying to read the data from line2.

 

Thanks in advance.


Accepted Solutions
Solution
‎02-20-2018 11:26 AM
SAS Employee
Posts: 30

Re: Reading UTF-8 Encoded csv File

[ Edited ]
Posted in reply to Satish_Parida

Oh, I see! If everything about your SAS session is Unix centralized, you should keep it that way. Just add the TERMSTR=LR option onto the INFILE statement and it should read the existing file correctly (or TERMSTR=CR if you may be missing carriage returns).

View solution in original post


All Replies
SAS Employee
Posts: 30

Re: Reading UTF-8 Encoded csv File

Posted in reply to Satish_Parida
It sounds to me like the CSV file you're working with has Unix based line endings instead of Windows based line endings, so the records are not being properly split.

I was able to resolve this issue by quickly opening the file in WordPad (instead of Notepad) and saving it under a different name. It automatically corrected the line breaks.
Frequent Contributor
Posts: 109

Re: Reading UTF-8 Encoded csv File

[ Edited ]
Posted in reply to GinaRepole

@GinaRepole 

Can it be done in bash command, or using any utility in base SAS?
I am trying to automate a batch process. I am using a UNIX OS, where SAS 9.4 is installed.
This is one example file.

Solution
‎02-20-2018 11:26 AM
SAS Employee
Posts: 30

Re: Reading UTF-8 Encoded csv File

[ Edited ]
Posted in reply to Satish_Parida

Oh, I see! If everything about your SAS session is Unix centralized, you should keep it that way. Just add the TERMSTR=LR option onto the INFILE statement and it should read the existing file correctly (or TERMSTR=CR if you may be missing carriage returns).

Frequent Contributor
Posts: 109

Re: Reading UTF-8 Encoded csv File

Posted in reply to GinaRepole

@GinaRepole

 

Thank you very much. Robot Happy

 

I was missing the carriage return. 

 

 

filename file1 '/u04/dataloader/Satish/App/ZZ/f1.csv' encoding='utf-8';

data xxx.asdf;
infile file1 dlm=',' dsd firstobs=2 encoding='utf-8' TERMSTR=CR; /*CR Worked as carriage return*/
input SiteNumber:$10. LabName:$30. AnalyteName:$16. FromDate:$20. ToDate:$20. 
      LowRange:best12. HighRange:best12. Units:$10. Dictionary:$10. Comments$10.;
run;

 

 

Thank you very much

 

 

SAS Employee
Posts: 30

Re: Reading UTF-8 Encoded csv File

Posted in reply to Satish_Parida
Cheers Smiley Happy

I adjusted my prior post to mention carriage returns as well in case anyone comes by to reference this in the future and only reads the solution post.
Super User
Super User
Posts: 7,932

Re: Reading UTF-8 Encoded csv File

Posted in reply to Satish_Parida

Satish_Parida wrote:

@GinaRepole

 

Thank you very much. Robot Happy

 

I was missing the carriage return. 

 

 

filename file1 '/u04/dataloader/Satish/App/ZZ/f1.csv' encoding='utf-8';

data xxx.asdf;
infile file1 dlm=',' dsd firstobs=2 encoding='utf-8' TERMSTR=CR; /*CR Worked as carriage return*/
input SiteNumber:$10. LabName:$30. AnalyteName:$16. FromDate:$20. ToDate:$20. 
      LowRange:best12. HighRange:best12. Units:$10. Dictionary:$10. Comments$10.;
run;

 

 

Thank you very much

 

 


Note: As far as I can tell only Microsoft Excel on Mac is still creating files using CR only as the end of line marks.  But you can pick a different file format when you save the files from Excel so that it will generate files using LF or CR+LF instead.  

Frequent Contributor
Posts: 109

Re: Reading UTF-8 Encoded csv File

If the source of the file is different and we can not predict the line breaks

Can we use more than one TERMSTR option in one data step. like LF+CR

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 290 views
  • 2 likes
  • 3 in conversation