BookmarkSubscribeRSS Feed
IJU
Fluorite | Level 6 IJU
Fluorite | Level 6
Dear SAS Experts!!!

We are getting SAS datasets with Unicode UTF-8 encoding from another country.
Our English SAS 9.2 does not support Unicode. A simple data step that reads an utf-8 dataset generates the following log error message:

NOTE: Data file UTF8.HE.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance.
ERROR: Some character data was lost during transcoding in the dataset UTF8.HE. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.

An attempt to use NLS options as shown below does not help and the same error is generated.

libname utf8 cvp "..\utf8" access=readonly cvpbytes=2;
libname win "..\win" outencoding=wlatin1;

data win.outds ( encoding='wlatin1');
set utf8.inds ( encoding='utf-8');
run;

Does anybody know how to process Unicode UTF-8 datasets on SAS that does not support Unicode? Is it really possible or should transcoding from UTF-8 to non-Unicode be done only using SAS with Unicode support?


Thank you for advance.
4 REPLIES 4
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Have you read this SAS support website document (the ENCODING= option is mentioned specifically)?

Processing Multilingual Data with the SAS 9.2 Unicode Server
http://support.sas.com/resources/papers/92unicodesrvr.pdf

Scott Barry
SBBWorks, Inc.
IJU
Fluorite | Level 6 IJU
Fluorite | Level 6
Hi Scott,

Thank you very much for your response.

The SAS support document refers to processing of multilingual data and conversion from
non-UTF-8 to UTF-8 encoding. It does say that UTF-8 data can be modified only in a UTF-8 session. Our data is English with UTF-8 encoding. I am looking for a way of converting SAS datasets with UTF-8 encoding to a non-UTF-8 encoding, i.e. wlatin1. The converted datasets should have same features as datasets created in a non-UTF-8 session. Apparently it can be done only in a UTF-8 session.

Thanks,
IJU
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Have you reviewed the SAS National Language Support (NLS): Reference Guide? There are INFORMATs documented for this type of data conversion.

Scott Barry
SBBWorks, Inc.
ehbales
SAS Employee

KPROPDATA and KCVT can transcode the character data from one encoding to another. If you set the INENCODING to BINARY on your LIBNAME statement to prevent SAS from transcoding the data set. Then convert the character columns using one of the functions mentioned above. 

 

KPROPDATA gives you a little more help by allowing you to specify how to handle characters that are not supported in your SAS session encoding.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 8034 views
  • 1 like
  • 3 in conversation