Administration and Deployment

sergie89 · Posted 03-14-2021 04:31 PM

Hello all,

We want to be able to export data from SAS 9.4 (Windows) to VIYA with the correct encoding. So, we want to convert encoding on SAS 9.4 (Windows) environment from LATIN1 to UTF8. I am curious what the main points are that we have to check in an existing environment before the change.

Thanks in advance.

Shmuel · Posted 03-15-2021 02:31 AM

As much as I understand it, encoding LATIN1 is partial to UTF8.

By UTF8 you can cover more languages and at same time use native encoding to many server-client SAS environments.

Kurt_Bremser · Posted 03-15-2021 04:59 AM

Question #1: does your data contain characters that lie in the ranges for UTF sequence starters? These need to be transcoded and will need more space (increase the defined length of the variables).
#2 do you use characters in the range 128-140 (the first 32 of the "upper" half). These are part of the Windows 1252 codepage, but not the standard WLATIN1.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

sergie89 · Posted 03-15-2021 07:23 AM

Hello @Kurt_Bremser ,

Thank you for your reply. Could you give me a little more explanation about UTF sequence starters ? I've read that all WLATIN1 characters can be transcoded to UTF-8. A transcoding error or warning means that the character variable is not long enough to hold the UTF-8 representation of those characters.

I found this url: https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=nlsref&docsetTarget=n15e3...

Kurt_Bremser · Posted 03-15-2021 10:45 AM

I suggest you study the Wikipedia article on UTF-8.

It shows you all the characters not available as single bytes when UTF-8 is used. If no such characters are used in your data, you can leave character lengths as they are.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

doug_sas · Posted 03-15-2021 10:49 AM

Not sure if this would help: https://www.pharmasug.org/proceedings/2016/BB/PharmaSUG-2016-BB15.pdf

Essentially, if you go from any single byte encoding to UTF8, your character data may expand since a character in the single byte encoding may turn into a multi-byte character in UTF8.

Administration and Deployment

What are the key differences between LATIN1 and UTF-8

Re: What are the key differences between LATIN1 and UTF-8

Re: What are the key differences between LATIN1 and UTF-8

Re: What are the key differences between LATIN1 and UTF-8

Re: What are the key differences between LATIN1 and UTF-8

Re: What are the key differences between LATIN1 and UTF-8

Efficiently Migrating Data to UTF-8 Encoding

TERADATA UTF-8 PROBLEM

Migrating SAS Data sets to UTF-8 Encoding with SAS Macros

Table created in UTF-8 session still Latin1

ODBC to Postgres DB is translating UTF-8 data into Latin1?

Follow Us

What is...