Architecting, installing and maintaining your SAS environment

Encoding issue with SAS 9.4 upgrade

Reply
Contributor
Posts: 22

Encoding issue with SAS 9.4 upgrade

Hi all,

 

We are upgrading from SAS 9.4 M3 to M4. 

 

 In M3 we have  Latin9 (SAS_en) as encoding . Now when we have upgraded we changed the encoding as UTF-8. 

 

Issues:

1. The Sas codes containing special character are not appearing correctly , whereas the Data in the table (ODBC, pre assigned library ) are appearing proper.

 

To overcome this if we change the session encoding of Upgraded (M4) version to Latin 9, then the Issue is :

 

The Sas codes containing special character are  appearing correctly , whereas the Data in the table (ODBC, pre assigned library) are not appearing proper.

 

 Please guide  if any one has faced this issue ...

 

Regards,

Manny

Trusted Advisor
Posts: 1,758

Re: Encoding issue with SAS 9.4 upgrade

Hello @Manny3,

 

I think first you should check the encoding set on your ODBC data provider and the encoding on your SAS datasets.

 

I also think that if you work with different languages, UTF-8 / UNICODE is a good choice.  However, it requires some adaptations to the previous way of working.

The challenge, is that you will need to set a standard in your code for transcoding and be aware of the encoding set on destination and sources, also how you will treat your data (such as with the trim () functions or special character treatment.

 

I would like to strongly recommend you to start reading this paper, it has helped me and many customers a lot.

  • The Impact of Change from wlatin1 to UTF-8 Encoding in SAS Environment
    Hui Song, PRA Health Sciences, Blue Bell, PA, USA
    Anja Koster, PRA Health Sciences, Zuidlaren, The Netherlands

 

https://www.pharmasug.org/proceedings/2016/BB/PharmaSUG-2016-BB15.pdf

 

 

Contributor
Posts: 22

Re: Encoding issue with SAS 9.4 upgrade

Posted in reply to JuanS_OCS

Thanks Juan, 

 I will go through the paper , hopefully it will give me some ways to resolve the issue.

 

 I will update back if I am able to resolve this.

 

 

Regards,

Manny

SAS Employee
Posts: 296

Re: Encoding issue with SAS 9.4 upgrade

Thanks @JuanS_OCS

 

@Manny3, just out of curiosity, why was the encoding changed with the M3 upgrade ? A maintenance pack upgrade and encoding are two totally unrelated things.

Contributor
Posts: 22

Re: Encoding issue with SAS 9.4 upgrade

Hej!,

 

 We are willing to follow the universal standard so we have updated the encoding stuff.

 

Also , even if we change the encoding back to latin 9 , it gives un -recongnized character in ODBC data sets as I explained in my last post.

 

 

Regards,

Manny

 

Trusted Advisor
Posts: 1,758

Re: Encoding issue with SAS 9.4 upgrade

Hello @Manny3,

 

as said, from now on, and until to steer up properly, you will need to be very careful of the encoding and formats used in your SAS process (your sasv9 configuration) abd in your origin and destination data sources.

 

  • So, as long as you won;t change encoding anymore, you will know the encoding of your SAS process: UTF-8/UNICODE.
  • Your data source / ODBC, you will need to know, on each database, what is the encoding and format used.
  • For your data sets created, they will be on the encoding and format as your SAS session. Hence, perhaps you are creating sas data sets with different encodings: the ones on latin9 as on your previous environment, then utf-8 as after the upgrade, then others in latin9 again after tyour previous change, and, again, other datasets in utf-8 after your latest change of configuration.

Procedure:

 

  1. I trully think the first thing you need to do is to align all your SAS data sets to a single utf-8 encoding. All fo them. So, first, get a proper list of tables + table encoding (proc contents)
  2. Then change your latin encoded SAS tables to utf-8 http://support.sas.com/kb/15/597.html
  3.  Finally, you will need to find the way that adapts to your requirements, to query and join data from your ODBC sources to your SAS tables (and the other way around). This will be a mix of:
    1. Adapting your current code to use/translate the current to the characters set from origin and destination tables, and modify functions such as trim.
    2. Having another Application Server (on the encoding adaptable to your ODBC sources). With this session, you will need to create filtered copies of your ODBC tables to SAS tables.
    3. With the Application Server (on UTF-8 encoding), repeat step 2 - change SAS tables to UTF-8
    4. Now that you have all your tables in SAS in UTF-8, you actually run your code (adapted). 
    5. If you want to push data back to ODBC sources, you will need to do the inverse patch 3.3 to 3.1) 

 

 

 

 

PS.

Check this out as well: https://www.sas.com/content/dam/SAS/en_ca/User%20Group%20Presentations/Edmonton-User-Group/SAS9UTF-8...

 

This might be of your interest too:

Problem Note 36652: Some characters might not render correctly when data is read from the Microsoft SQL Server database to a SAS® Unicode session 

 

http://support.sas.com/kb/36/652.html

However, a SAS Unicode session might not render correctly any non-Unicode character data from Microsoft SQL Server. If the operating system's locale and the database locale are the same, the ODBC driver does not transcode the data, and SAS cannot determine the encoding.

To work around the issue, correct the encoding attribute of the SAS data set that is created from the database to convert it to session encoding

 

 

 

 

Ask a Question
Discussion stats
  • 5 replies
  • 161 views
  • 5 likes
  • 3 in conversation