BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SAS_Question
Quartz | Level 8

Hi all, 

I use this macro to import all csv files in a directory: 

https://github.com/statgeek/SAS-Tutorials/blob/master/Import_all_files_one_type

(Thanks to statgeek for this macro!) 

 

But now i have this csv with special chars in it. And my SAS machine is default on wlatin1. 

But this particular csv needs to be imported as utf-8!

 

Now I want to change this in the macro: 

%macro import_file(path, file_name, dataset_name );

	proc import 
		datafile="&path.\&file_name."
		dbms=xlsx
		out=&dataset_name replace;
	run;

%mend;

/* I want to change it to: */

%macro import_file(path, file_name, dataset_name );

filename utf_imp "&path.\&file_name." encoding="utf-8";
	proc import 
		datafile=utf_imp
		dbms=csv
		out=&dataset_name replace;
	run;
filename utf_imp clear;
%mend;

Am I right? Is this the way to do it? 

As I can't test now (because only from work I can test it), I ask it this way. Sorry guys, and thanks in advance! 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Don't.

 

The problem is that SAS running with any single byte character encoding, such as LATIN1, can only represent 256 characters. A text file using UTF-8 encoding could have any of thousands of different characters so there is no clear way to even read the strings into variables to start trying to convert.

 

So run SAS using UTF-8 to read files that are in UTF-8.  You can then evaluate the characters to determine if any of them are not valid LATIN1 characters and decide what to do with them. 

 

For example if you have a dataset with a character variable named STRING that could have UTF-8 characters that are not valid for use with LATIN1 you could use code like this to replace those character with html encoded strings so that the result could be stored in LATIN1 encoding.  Make sure the variables are defined long enough for any extra length that might be needed to store the encoded strings.

data fixed;
  set example;
  do until(loc=0);
    loc=kverify(string,collate(0,127)||kcvt(collate(128,255),'latin1','utf-8'));
    if loc then do;
      length char $4 ;
      char=ksubstr(string,loc,1);
      string=tranwrd(string,trim(char),htmlencode(trim(char),'7bit'));
    end;
  end;
  drop loc char;
run;

 

 

View solution in original post

1 REPLY 1
Tom
Super User Tom
Super User

Don't.

 

The problem is that SAS running with any single byte character encoding, such as LATIN1, can only represent 256 characters. A text file using UTF-8 encoding could have any of thousands of different characters so there is no clear way to even read the strings into variables to start trying to convert.

 

So run SAS using UTF-8 to read files that are in UTF-8.  You can then evaluate the characters to determine if any of them are not valid LATIN1 characters and decide what to do with them. 

 

For example if you have a dataset with a character variable named STRING that could have UTF-8 characters that are not valid for use with LATIN1 you could use code like this to replace those character with html encoded strings so that the result could be stored in LATIN1 encoding.  Make sure the variables are defined long enough for any extra length that might be needed to store the encoded strings.

data fixed;
  set example;
  do until(loc=0);
    loc=kverify(string,collate(0,127)||kcvt(collate(128,255),'latin1','utf-8'));
    if loc then do;
      length char $4 ;
      char=ksubstr(string,loc,1);
      string=tranwrd(string,trim(char),htmlencode(trim(char),'7bit'));
    end;
  end;
  drop loc char;
run;

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 1181 views
  • 0 likes
  • 2 in conversation