DATA Step, Macro, Functions and more

Transcoding SAS datasets from Unicode utf-8 to non-Unicode encoding

Reply
Occasional Contributor IJU
Occasional Contributor
Posts: 10

Transcoding SAS datasets from Unicode utf-8 to non-Unicode encoding

Dear SAS Experts!!!

We are getting SAS datasets with Unicode UTF-8 encoding from another country.
Our English SAS 9.2 does not support Unicode. A simple data step that reads an utf-8 dataset generates the following log error message:

NOTE: Data file UTF8.HE.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance.
ERROR: Some character data was lost during transcoding in the dataset UTF8.HE. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.

An attempt to use NLS options as shown below does not help and the same error is generated.

libname utf8 cvp "..\utf8" access=readonly cvpbytes=2;
libname win "..\win" outencoding=wlatin1;

data win.outds ( encoding='wlatin1');
set utf8.inds ( encoding='utf-8');
run;

Does anybody know how to process Unicode UTF-8 datasets on SAS that does not support Unicode? Is it really possible or should transcoding from UTF-8 to non-Unicode be done only using SAS with Unicode support?


Thank you for advance.
Super Contributor
Super Contributor
Posts: 3,174

Re: Transcoding SAS datasets from Unicode utf-8 to non-Unicode encoding

Have you read this SAS support website document (the ENCODING= option is mentioned specifically)?

Processing Multilingual Data with the SAS 9.2 Unicode Server
http://support.sas.com/resources/papers/92unicodesrvr.pdf

Scott Barry
SBBWorks, Inc.
Occasional Contributor IJU
Occasional Contributor
Posts: 10

Re: Transcoding SAS datasets from Unicode utf-8 to non-Unicode encoding

Hi Scott,

Thank you very much for your response.

The SAS support document refers to processing of multilingual data and conversion from
non-UTF-8 to UTF-8 encoding. It does say that UTF-8 data can be modified only in a UTF-8 session. Our data is English with UTF-8 encoding. I am looking for a way of converting SAS datasets with UTF-8 encoding to a non-UTF-8 encoding, i.e. wlatin1. The converted datasets should have same features as datasets created in a non-UTF-8 session. Apparently it can be done only in a UTF-8 session.

Thanks,
IJU
Super Contributor
Super Contributor
Posts: 3,174

Re: Transcoding SAS datasets from Unicode utf-8 to non-Unicode encoding

Have you reviewed the SAS National Language Support (NLS): Reference Guide? There are INFORMATs documented for this type of data conversion.

Scott Barry
SBBWorks, Inc.
Ask a Question
Discussion stats
  • 3 replies
  • 1341 views
  • 0 likes
  • 2 in conversation