Hello,
our server is running SAS in UTF-8 and we use EG for development. I am facing issues putting ≤ symbol into proc format:
proc format;
value $avisit
"V0" = "V0 ≤28d pre"
"V1PRE" = "V1 pre"
"V2" = "V2 3+2d post"
"V3" = "V3 7+2d post"
"V4" = "V4 28±3d post"
"V5" = "V5 3m±7d post"
"V6" = "V6 6m±10d post";
If I save the file, ≤ gets converted to =. If I open the saved file with Notepad++, it says that the file is ANSI encoded. If I then change the encoding to UTF-8 and fix the file up, I get this:
Can this be made to work? I guess I could read a text file and use ctnlin parameted but this seems rather excessive. Thank you for your feedback in advance.
Before saving the file select proper encoding. In EG8 it looks like:
For EG7 it looks a bit different but is there too.
Bart
I see the issue, and I don't know enough about changing encoding and if that would work.
I usually embed special characters in my formats with this kind of syntax. E.g., in this case the unicode value for a LE sign is 2264
proc format;
value quantity 1 = 'Never'
2 = "1(*ESC*){unicode '2264'x}5 visits"
3 = "6(*ESC*){unicode '2264'x}10 visits"
;
This works for ODS output but not if I wish to have unicode symbols in my datasets. I have reached out to SAS support regarding this as it seems quite misleading to claim to "support" unicode (which dates back to the 90s) while requiring the code itself to be plain ASCII. I am guessing I would have similar issues if I had to refer to either variables or values containing characters not representable by ASCII.
Before saving the file select proper encoding. In EG8 it looks like:
For EG7 it looks a bit different but is there too.
Bart
Thanks, it worked. When compared to the manually prepared unicode file is that SAS EG saves it with byte-order mark: UTF-8-BOM as opposed to UTF-8. Can the default encoding be changed?
True, it saves it as UTF-8-BOM and it looks like there is no UTF-8-NOBOM on the list.
I didn't find any option in the "Tools -> Options ->" menu to set default encoding... The only thing that pops-up in my head is that maybe there is a Windows registry key to edit for that. The fist person I would ask about such possibility is @ChrisHemedinger. ( In general, Chris knows a lot about EG so he is a good point of contact 😉 )
Bart
According to Wikipedia, BOM should not be necessary to recognise a file as UTF-8 but many programs need it regardless [1]:
The Unicode Standard permits the BOM in UTF-8,[4] but does not require or recommend its use. [5](...) Microsoft compilers[11] and interpreters, and many pieces of software on Microsoft Windows such as Notepad (prior to Windows 10 Build 1903[12]) treat the BOM as a required magic number rather than use heuristics. These tools add a BOM when saving text as UTF-8, and cannot interpret UTF-8 unless the BOM is present or the file contains only ASCII
Setting UTF-8-BOM as default would definitely be useful as otherwise one has to actively parse the code for symbols not representable as ASCII which is not very realistic. Moreover, the option to select encoding only appers when using File -> Save as or the respective button, but not when going via Properties -> Save as, which makes it super easy to miss.
Copying this from another related discussion -- in general it's better to detect UTF-8 by examining contents and not relying on BOM. But some systems might still rely on it.
"Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature"
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.