BookmarkSubscribeRSS Feed
IJU
Fluorite | Level 6 IJU
Fluorite | Level 6
Dear SAS Experts!!!

We are getting SAS datasets with Unicode UTF-8 encoding from another country.
Our English SAS 9.2 does not support Unicode. A simple data step that reads an utf-8 dataset generates the following log error message:

NOTE: Data file UTF8.HE.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance.
ERROR: Some character data was lost during transcoding in the dataset UTF8.HE. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.

An attempt to use NLS options as shown below does not help and the same error is generated.

libname utf8 cvp "..\utf8" access=readonly cvpbytes=2;
libname win "..\win" outencoding=wlatin1;

data win.outds ( encoding='wlatin1');
set utf8.inds ( encoding='utf-8');
run;

Does anybody know how to process Unicode UTF-8 datasets on SAS that does not support Unicode? Is it really possible or should transcoding from UTF-8 to non-Unicode be done only using SAS with Unicode support?


Thank you for advance.
4 REPLIES 4
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Have you read this SAS support website document (the ENCODING= option is mentioned specifically)?

Processing Multilingual Data with the SAS 9.2 Unicode Server
http://support.sas.com/resources/papers/92unicodesrvr.pdf

Scott Barry
SBBWorks, Inc.
IJU
Fluorite | Level 6 IJU
Fluorite | Level 6
Hi Scott,

Thank you very much for your response.

The SAS support document refers to processing of multilingual data and conversion from
non-UTF-8 to UTF-8 encoding. It does say that UTF-8 data can be modified only in a UTF-8 session. Our data is English with UTF-8 encoding. I am looking for a way of converting SAS datasets with UTF-8 encoding to a non-UTF-8 encoding, i.e. wlatin1. The converted datasets should have same features as datasets created in a non-UTF-8 session. Apparently it can be done only in a UTF-8 session.

Thanks,
IJU
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Have you reviewed the SAS National Language Support (NLS): Reference Guide? There are INFORMATs documented for this type of data conversion.

Scott Barry
SBBWorks, Inc.
ehbales
SAS Employee

KPROPDATA and KCVT can transcode the character data from one encoding to another. If you set the INENCODING to BINARY on your LIBNAME statement to prevent SAS from transcoding the data set. Then convert the character columns using one of the functions mentioned above. 

 

KPROPDATA gives you a little more help by allowing you to specify how to handle characters that are not supported in your SAS session encoding.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 6330 views
  • 1 like
  • 3 in conversation