Solved: Re: SAS EG not supporting UTF-8 in code?

js5 · Posted 03-30-2023 10:29 AM

Hello,

our server is running SAS in UTF-8 and we use EG for development. I am facing issues putting ≤ symbol into proc format:

proc format;
	value $avisit
		"V0" = "V0 ≤28d pre"
		"V1PRE" = "V1 pre"
		"V2" = "V2 3+2d post"
		"V3" = "V3 7+2d post"
		"V4" = "V4 28±3d post"
		"V5" = "V5 3m±7d post"
		"V6" = "V6 6m±10d post";

If I save the file, ≤ gets converted to =. If I open the saved file with Notepad++, it says that the file is ANSI encoded. If I then change the encoding to UTF-8 and fix the file up, I get this:

Can this be made to work? I guess I could read a text file and use ctnlin parameted but this seems rather excessive. Thank you for your feedback in advance.

yabwon · Posted 05-31-2023 05:29 AM

Before saving the file select proper encoding. In EG8 it looks like:

For EG7 it looks a bit different but is there too.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

View solution in original post

svh · Posted 03-30-2023 10:47 AM

I see the issue, and I don't know enough about changing encoding and if that would work.

I usually embed special characters in my formats with this kind of syntax. E.g., in this case the unicode value for a LE sign is 2264

proc format;
   value quantity 1 = 'Never'
             2 = "1(*ESC*){unicode '2264'x}5 visits"
             3 = "6(*ESC*){unicode '2264'x}10 visits"
;

js5 · Posted 05-31-2023 05:16 AM

This works for ODS output but not if I wish to have unicode symbols in my datasets. I have reached out to SAS support regarding this as it seems quite misleading to claim to "support" unicode (which dates back to the 90s) while requiring the code itself to be plain ASCII. I am guessing I would have similar issues if I had to refer to either variables or values containing characters not representable by ASCII.

yabwon · Posted 05-31-2023 05:29 AM

Before saving the file select proper encoding. In EG8 it looks like:

For EG7 it looks a bit different but is there too.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

js5 · Posted 05-31-2023 05:47 AM

Thanks, it worked. When compared to the manually prepared unicode file is that SAS EG saves it with byte-order mark: UTF-8-BOM as opposed to UTF-8. Can the default encoding be changed?

yabwon · Posted 05-31-2023 06:02 AM

True, it saves it as UTF-8-BOM and it looks like there is no UTF-8-NOBOM on the list.

I didn't find any option in the "Tools -> Options ->" menu to set default encoding... The only thing that pops-up in my head is that maybe there is a Windows registry key to edit for that. The fist person I would ask about such possibility is @ChrisHemedinger. ( In general, Chris knows a lot about EG so he is a good point of contact 😉 )

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

js5 · Posted 05-31-2023 06:20 AM

According to Wikipedia, BOM should not be necessary to recognise a file as UTF-8 but many programs need it regardless [1]:

The Unicode Standard permits the BOM in UTF-8,[4] but does not require or recommend its use. [5](...) Microsoft compilers[11] and interpreters, and many pieces of software on Microsoft Windows such as Notepad (prior to Windows 10 Build 1903[12]) treat the BOM as a required magic number rather than use heuristics. These tools add a BOM when saving text as UTF-8, and cannot interpret UTF-8 unless the BOM is present or the file contains only ASCII

Setting UTF-8-BOM as default would definitely be useful as otherwise one has to actively parse the code for symbols not representable as ASCII which is not very realistic. Moreover, the option to select encoding only appers when using File -> Save as or the respective button, but not when going via Properties -> Save as, which makes it super easy to miss.

[1] https://en.wikipedia.org/wiki/Byte_order_mark

ChrisHemedinger · Posted 05-31-2023 09:16 AM

Copying this from another related discussion -- in general it's better to detect UTF-8 by examining contents and not relying on BOM. But some systems might still rely on it.

"Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature"

https://www.unicode.org/versions/Unicode6.0.0/ch02.pdf

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away