Solved: Re: Import csv files with a macro in encoded utf-8 format?

SAS_Question · Posted 06-27-2022 09:26 AM

Hi all,

I use this macro to import all csv files in a directory:

https://github.com/statgeek/SAS-Tutorials/blob/master/Import_all_files_one_type

(Thanks to statgeek for this macro!)

But now i have this csv with special chars in it. And my SAS machine is default on wlatin1.

But this particular csv needs to be imported as utf-8!

Now I want to change this in the macro:

%macro import_file(path, file_name, dataset_name );

	proc import 
		datafile="&path.\&file_name."
		dbms=xlsx
		out=&dataset_name replace;
	run;

%mend;

/* I want to change it to: */

%macro import_file(path, file_name, dataset_name );

filename utf_imp "&path.\&file_name." encoding="utf-8";
	proc import 
		datafile=utf_imp
		dbms=csv
		out=&dataset_name replace;
	run;
filename utf_imp clear;
%mend;

Am I right? Is this the way to do it?

As I can't test now (because only from work I can test it), I ask it this way. Sorry guys, and thanks in advance!

Tom · Posted 06-27-2022 09:37 AM

Don't.

The problem is that SAS running with any single byte character encoding, such as LATIN1, can only represent 256 characters. A text file using UTF-8 encoding could have any of thousands of different characters so there is no clear way to even read the strings into variables to start trying to convert.

So run SAS using UTF-8 to read files that are in UTF-8. You can then evaluate the characters to determine if any of them are not valid LATIN1 characters and decide what to do with them.

For example if you have a dataset with a character variable named STRING that could have UTF-8 characters that are not valid for use with LATIN1 you could use code like this to replace those character with html encoded strings so that the result could be stored in LATIN1 encoding. Make sure the variables are defined long enough for any extra length that might be needed to store the encoded strings.

data fixed;
  set example;
  do until(loc=0);
    loc=kverify(string,collate(0,127)||kcvt(collate(128,255),'latin1','utf-8'));
    if loc then do;
      length char $4 ;
      char=ksubstr(string,loc,1);
      string=tranwrd(string,trim(char),htmlencode(trim(char),'7bit'));
    end;
  end;
  drop loc char;
run;

View solution in original post

Tom · Posted 06-27-2022 09:37 AM

Don't.

The problem is that SAS running with any single byte character encoding, such as LATIN1, can only represent 256 characters. A text file using UTF-8 encoding could have any of thousands of different characters so there is no clear way to even read the strings into variables to start trying to convert.

So run SAS using UTF-8 to read files that are in UTF-8. You can then evaluate the characters to determine if any of them are not valid LATIN1 characters and decide what to do with them.

For example if you have a dataset with a character variable named STRING that could have UTF-8 characters that are not valid for use with LATIN1 you could use code like this to replace those character with html encoded strings so that the result could be stored in LATIN1 encoding. Make sure the variables are defined long enough for any extra length that might be needed to store the encoded strings.

data fixed;
  set example;
  do until(loc=0);
    loc=kverify(string,collate(0,127)||kcvt(collate(128,255),'latin1','utf-8'));
    if loc then do;
      length char $4 ;
      char=ksubstr(string,loc,1);
      string=tranwrd(string,trim(char),htmlencode(trim(char),'7bit'));
    end;
  end;
  drop loc char;
run;

Import csv files with a macro in encoded utf-8 format?

Re: Import csv files with a macro in encoded utf-8 format?

Re: Import csv files with a macro in encoded utf-8 format?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away