BookmarkSubscribeRSS Feed
jklaverstijn
Rhodochrosite | Level 12

Hi all,

 

Sorry for the long intro. But it needs to be explained.

 

We are in the process of migrating a SAS Grid environment from Windows 9.4M1 to AIX 9.4M6. Initially it was considered a "natural" choice to let the encoding go from WLATIN1 on Windows to LATIN1 on AIX. And yes, UTF-8 was considered and dropped; let's keep that out of the discussion please.

 

So Latin1 did it's job just fine until trouble arose when we found out that it does not contain the Euro sign. Lots of data is stored in SQL Server and occurences of the "€" sign were silently transcoded into a generic currency symbol "¤". And being a Dutch insurance company this is a big deal. So after careful consideration and testing we changed to Latin9, advertised as "Latin1 but with the Euro sign". But now we get into a whole new area of problems. Existing occurences of all sorts of the more obscure WLATIN1 characters like the m-dash and MSOffice's "smart quotes" are now leading to errors when creating XML. Main affected area is the AD sync where for example the %mduextr macro extracts metadata. This now fails with the following message:

ERROR: Failed to transcode data from U_UTF8_CE to U_LATIN9_CE encoding because it contained characters which are not supported by your SAS session encoding. Please review your encoding= and locale= SAS system options to ensure that they can accommodate

This did not occur when we used Latin1!


We tracked this down to metadata of users where phone numbers were entered using copy/paste from Word and Excel that introduced these. Probably in part starting in user management in active directory itself and then being imported. Another occurence was an "em-dash" in the name of a metadata library. But we also have a job that scans the metadata of DI Studio jobs which runs in similar issues with descriptions of steps and even user written code that contains these characters that are not present in the Latin9 encoding. This meant the problem runs so deep now that we cannot just replace a few characters.

 

The easy way out seems going back to Latin1. But hey, that pesky Euro sign is definitely a big deal.

 

It seems to revolve around XML generation from proc metadata. So my main questions are:

1) Has anyone seen this and found a way to fix it?

2) Why is the impact of the change from Latin1 to Latin9 so big?

4 REPLIES 4
Kurt_Bremser
Super User

LATIN1 on AIX does support the Euro sign; we have used it ever since migrating from z/OS to AIX.

The character entered in Enterprise Guide with AltGr-E (hex 80, as in Windows-1252) displays as the Euro sign, both in the viewtable and in proc print (listing output).

What is the hex code of the Euro sign as used in your SQL Server?

 

But whatever you do, you should use a uniform encoding throughout your whole IT environment.

This encompasses operating systems, application software, and infrastructure like AD.

 

jklaverstijn
Rhodochrosite | Level 12

Hi Kurt,

 

If the Euro sign is in the Latin1 set then it is not transcoded properly by SAS (or ODBC or whatever is in the chain). We take data from SQL Server and Euro signs are re[pl;ced with the SUB character. When choosing Latin9 it is not.

 

I agree with your statement about a consistent approach for encoding. In this company that would almost certainly mean UTF-8. And that is not something that can be achived on the premise of this migration issue. We must keep this small and focused.

Also consider this: I create a DIS job in Windows containing a euro sign, some smart quotes and m-dash in its description. I make an export and import that in AIX. When setting encoding=latin1 I can succesfully deploy the code for that job (albeit with substituted characters). However, when I set encoding=latin9, I get:

jklaverstijn_0-1614848654522.png


So there is a difference between the two that is not obviously explained by the differences in their actual definition.

 

Maybe a track with SAS Support is in order/

Kurt_Bremser
Super User

I would also get in touch with SAS TS, just to get a feel why there's a failure with LATIN9 that does not happen with LATIN1.

But I guess it is because LATIN9 specifically excludes characters A4 to BE as seen here.

 

Since your only trouble with LATIN1 seems to be the Euro sign, consider converting it from hex A4 to hex 80 on import into SAS.

jklaverstijn
Rhodochrosite | Level 12

Thanks Kurt,

I opened a track and will update this thread with its conclusions.

 

I will keep you posted on how the track fares.

 

Regards,

- Jan.

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1655 views
  • 4 likes
  • 2 in conversation