Re: Encoding issue with datasets

somebody · Posted 09-12-2019 01:32 AM

I have some datasets downloaded from another source (WRDS). When I first work with these datasets, a NOTE message appears and say "the dataset is in a format the is native to another host, or the file encoding does not match the session endcoding. Cross Environment Data Acess will be used ......". For most datasets, there was no further error message for this. However, I have an Error message with my recent dataset. It just does not copy to another library, or let me sort or do anything.

I understand that the dataset uses a different encoding ( WLATIN1 to be exact) whereas my session encoding is UTF-8. Wouldn't UTF-8 be able to read in the data?

How should I deal with this issue? How can I change the encoding of the dataset once and then work with it later on?

pau13rown · Posted 09-12-2019 02:02 AM

the original source data are sas datasets or you convert them from something else? how are you importing the data into sas? the note in the log may be relevant for a subset of variables and you might want to figure out which ones they are. In which case you can modify the import (use the code generated by proc import that appears in the log). Maybe you have done these things already though ...

somebody · Posted 09-12-2019 04:56 AM

The original dataset is a SAS dataset. I am downloading the CSV version to manually import. But the dataset is about 500 gb...

Kurt_Bremser · Posted 09-12-2019 04:58 AM

@somebody wrote:

The original dataset is a SAS dataset. I am downloading the CSV version to manually import. But the dataset is about 500 gb...

The DATASET or the CSV file?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 09-12-2019 02:15 AM

Please post (copy/paste) the log of the step with the ERROR.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AnandVyas · Posted 09-12-2019 02:20 AM

Hi @somebody

SAS has a system option ENCODING=<value> which you can provide at run time incase you wish to explicity set encoding when SAS session is invoked.

The following example shows that the WLATIN2 encoding is set explicitly when SAS is invoked:

sas -encoding wlatin2

This can help to read datasets for particular session when used during invoking session. It can be set as parameter in sasv9.cfg file as well in order to change the encoding for all the SAS sessions.

Thanks!

somebody · Posted 09-12-2019 05:48 AM

Does this convert the new dataset to the encoding of my system?

I had add CVP to my libname statement and it runs so far. I still cannot check if the new dataset is native to my computer because the dataset is very large. The original dataset is 550gb. I am copying it to another library, hoping that SAS will change the encoding when creating a new dataset. However, the new dataset seems to be bigger than the original one (700 gb now). Maybe this is due to the fact that SAS use UTF-8 encoding which uses more bytes? is this true?

Kurt_Bremser · Posted 09-12-2019 07:27 AM

Also take a look if compress=yes has a noticeable effect when you run it again.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

somebody · Posted 09-15-2019 08:45 PM

Adding the option CVP to the libname command seems to resolve the issue. One minor problem is that the new dataset becomes quite large ,about 1.5-2 time the size of the original dataset. This is perhaps because SAS allocates larger buffer/ storage to read in the data.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away