We are getting SAS datasets with Unicode UTF-8 encoding from another country.
Our English SAS 9.2 does not support Unicode. A simple data step that reads an utf-8 dataset generates the following log error message:
NOTE: Data file UTF8.HE.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance. ERROR: Some character data was lost during transcoding in the dataset UTF8.HE. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding. NOTE: The DATA step has been abnormally terminated.
An attempt to use NLS options as shown below does not help and the same error is generated.
data win.outds ( encoding='wlatin1');
set utf8.inds ( encoding='utf-8');
Does anybody know how to process Unicode UTF-8 datasets on SAS that does not support Unicode? Is it really possible or should transcoding from UTF-8 to non-Unicode be done only using SAS with Unicode support?
The SAS support document refers to processing of multilingual data and conversion from
non-UTF-8 to UTF-8 encoding. It does say that UTF-8 data can be modified only in a UTF-8 session. Our data is English with UTF-8 encoding. I am looking for a way of converting SAS datasets with UTF-8 encoding to a non-UTF-8 encoding, i.e. wlatin1. The converted datasets should have same features as datasets created in a non-UTF-8 session. Apparently it can be done only in a UTF-8 session.