BookmarkSubscribeRSS Feed
somebody
Lapis Lazuli | Level 10

I have some datasets downloaded from another source (WRDS). When I first work with these datasets, a NOTE message appears and say "the dataset is in a format the is native to another host, or the file encoding does not match the session endcoding. Cross Environment Data Acess will be used ......". For most datasets, there was no further error message for this. However, I have an Error message with my recent dataset. It just does not copy to another library, or let me sort or do anything. 

I understand that the dataset uses a different encoding ( WLATIN1 to be exact) whereas my session encoding is UTF-8. Wouldn't UTF-8 be able to read in the data? 

How should I deal with this issue? How can I change the encoding of the dataset once and then work with it later on?

8 REPLIES 8
pau13rown
Lapis Lazuli | Level 10

the original source data are sas datasets or you convert them from something else? how are you importing the data into sas? the note in the log may be relevant for a subset of variables and you might want to figure out which ones they are. In which case you can modify the import (use the code generated by proc import that appears in the log). Maybe you have done these things already though ...

somebody
Lapis Lazuli | Level 10

The original dataset is a SAS dataset. I am downloading the CSV version to manually import. But the dataset is about 500 gb...

Kurt_Bremser
Super User

@somebody wrote:

The original dataset is a SAS dataset. I am downloading the CSV version to manually import. But the dataset is about 500 gb...


The DATASET or the CSV file?

AnandVyas
Ammonite | Level 13

Hi @somebody 

 

SAS has a system option ENCODING=<value> which you can provide at run time incase you wish to explicity set encoding when SAS session is invoked. 

 

The following example shows that the WLATIN2 encoding is set explicitly when SAS is invoked:

sas -encoding wlatin2

This can help to read datasets for particular session when used during invoking session. It can be set as parameter in sasv9.cfg file as well in order to change the encoding for all the SAS sessions.

 

Thanks! 

somebody
Lapis Lazuli | Level 10

Does this convert the new dataset to the encoding of my system? 

I had add CVP to my libname statement and it runs so far. I still cannot check if the new dataset is native to my computer because the dataset is very large. The original dataset is 550gb. I am copying it to another library, hoping that SAS will change the encoding when creating a new dataset. However, the new dataset seems to be bigger than the original one (700 gb now). Maybe this is due to the fact that SAS use UTF-8 encoding which uses more bytes? is this true?

somebody
Lapis Lazuli | Level 10

Adding the option CVP to the libname command seems to resolve the issue. One minor problem is that the new dataset becomes quite large ,about 1.5-2 time the size of the original dataset. This is perhaps because SAS allocates larger buffer/ storage to read in the data. 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1373 views
  • 0 likes
  • 4 in conversation