Hi all,
I use this macro to import all csv files in a directory:
https://github.com/statgeek/SAS-Tutorials/blob/master/Import_all_files_one_type
(Thanks to statgeek for this macro!)
But now i have this csv with special chars in it. And my SAS machine is default on wlatin1.
But this particular csv needs to be imported as utf-8!
Now I want to change this in the macro:
%macro import_file(path, file_name, dataset_name );
proc import
datafile="&path.\&file_name."
dbms=xlsx
out=&dataset_name replace;
run;
%mend;
/* I want to change it to: */
%macro import_file(path, file_name, dataset_name );
filename utf_imp "&path.\&file_name." encoding="utf-8";
proc import
datafile=utf_imp
dbms=csv
out=&dataset_name replace;
run;
filename utf_imp clear;
%mend;
Am I right? Is this the way to do it?
As I can't test now (because only from work I can test it), I ask it this way. Sorry guys, and thanks in advance!
Don't.
The problem is that SAS running with any single byte character encoding, such as LATIN1, can only represent 256 characters. A text file using UTF-8 encoding could have any of thousands of different characters so there is no clear way to even read the strings into variables to start trying to convert.
So run SAS using UTF-8 to read files that are in UTF-8. You can then evaluate the characters to determine if any of them are not valid LATIN1 characters and decide what to do with them.
For example if you have a dataset with a character variable named STRING that could have UTF-8 characters that are not valid for use with LATIN1 you could use code like this to replace those character with html encoded strings so that the result could be stored in LATIN1 encoding. Make sure the variables are defined long enough for any extra length that might be needed to store the encoded strings.
data fixed;
set example;
do until(loc=0);
loc=kverify(string,collate(0,127)||kcvt(collate(128,255),'latin1','utf-8'));
if loc then do;
length char $4 ;
char=ksubstr(string,loc,1);
string=tranwrd(string,trim(char),htmlencode(trim(char),'7bit'));
end;
end;
drop loc char;
run;
Don't.
The problem is that SAS running with any single byte character encoding, such as LATIN1, can only represent 256 characters. A text file using UTF-8 encoding could have any of thousands of different characters so there is no clear way to even read the strings into variables to start trying to convert.
So run SAS using UTF-8 to read files that are in UTF-8. You can then evaluate the characters to determine if any of them are not valid LATIN1 characters and decide what to do with them.
For example if you have a dataset with a character variable named STRING that could have UTF-8 characters that are not valid for use with LATIN1 you could use code like this to replace those character with html encoded strings so that the result could be stored in LATIN1 encoding. Make sure the variables are defined long enough for any extra length that might be needed to store the encoded strings.
data fixed;
set example;
do until(loc=0);
loc=kverify(string,collate(0,127)||kcvt(collate(128,255),'latin1','utf-8'));
if loc then do;
length char $4 ;
char=ksubstr(string,loc,1);
string=tranwrd(string,trim(char),htmlencode(trim(char),'7bit'));
end;
end;
drop loc char;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.