I have some datasets downloaded from another source (WRDS). When I first work with these datasets, a NOTE message appears and say "the dataset is in a format the is native to another host, or the file encoding does not match the session endcoding. Cross Environment Data Acess will be used ......". For most datasets, there was no further error message for this. However, I have an Error message with my recent dataset. It just does not copy to another library, or let me sort or do anything.
I understand that the dataset uses a different encoding ( WLATIN1 to be exact) whereas my session encoding is UTF-8. Wouldn't UTF-8 be able to read in the data?
How should I deal with this issue? How can I change the encoding of the dataset once and then work with it later on?
the original source data are sas datasets or you convert them from something else? how are you importing the data into sas? the note in the log may be relevant for a subset of variables and you might want to figure out which ones they are. In which case you can modify the import (use the code generated by proc import that appears in the log). Maybe you have done these things already though ...
The original dataset is a SAS dataset. I am downloading the CSV version to manually import. But the dataset is about 500 gb...
@somebody wrote:
The original dataset is a SAS dataset. I am downloading the CSV version to manually import. But the dataset is about 500 gb...
The DATASET or the CSV file?
Please post (copy/paste) the log of the step with the ERROR.
Hi @somebody
SAS has a system option ENCODING=<value> which you can provide at run time incase you wish to explicity set encoding when SAS session is invoked.
The following example shows that the WLATIN2 encoding is set explicitly when SAS is invoked:
sas -encoding wlatin2
This can help to read datasets for particular session when used during invoking session. It can be set as parameter in sasv9.cfg file as well in order to change the encoding for all the SAS sessions.
Thanks!
Does this convert the new dataset to the encoding of my system?
I had add CVP to my libname statement and it runs so far. I still cannot check if the new dataset is native to my computer because the dataset is very large. The original dataset is 550gb. I am copying it to another library, hoping that SAS will change the encoding when creating a new dataset. However, the new dataset seems to be bigger than the original one (700 gb now). Maybe this is due to the fact that SAS use UTF-8 encoding which uses more bytes? is this true?
Also take a look if compress=yes has a noticeable effect when you run it again.
Adding the option CVP to the libname command seems to resolve the issue. One minor problem is that the new dataset becomes quite large ,about 1.5-2 time the size of the original dataset. This is perhaps because SAS allocates larger buffer/ storage to read in the data.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.