- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
We want to be able to export data from SAS 9.4 (Windows) to VIYA with the correct encoding. So, we want to convert encoding on SAS 9.4 (Windows) environment from LATIN1 to UTF8. I am curious what the main points are that we have to check in an existing environment before the change.
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As much as I understand it, encoding LATIN1 is partial to UTF8.
By UTF8 you can cover more languages and at same time use native encoding to many server-client SAS environments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
#2 do you use characters in the range 128-140 (the first 32 of the "upper" half). These are part of the Windows 1252 codepage, but not the standard WLATIN1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Kurt_Bremser ,
Thank you for your reply. Could you give me a little more explanation about UTF sequence starters ? I've read that all WLATIN1 characters can be transcoded to UTF-8. A transcoding error or warning means that the character variable is not long enough to hold the UTF-8 representation of those characters.
I found this url: https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=nlsref&docsetTarget=n15e3...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I suggest you study the Wikipedia article on UTF-8.
It shows you all the characters not available as single bytes when UTF-8 is used. If no such characters are used in your data, you can leave character lengths as they are.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure if this would help: https://www.pharmasug.org/proceedings/2016/BB/PharmaSUG-2016-BB15.pdf
Essentially, if you go from any single byte encoding to UTF8, your character data may expand since a character in the single byte encoding may turn into a multi-byte character in UTF8.