Hi Community,
I have observed that when generating a PDF using SAS 9.4 with the Unicode (UTF-8), the output with font Courier New becomes non-searchable, whereas the same program run in the English (WLATIN1) produces a searchable PDF.
May I ask why this difference occurs, and more importantly, how can we generate a searchable PDF using Courier New under the SAS (Unicode)?
Any insights or recommended solutions would be greatly appreciated.
Thank you.
Kind regards,
John
Hi Kathryn,
Thank you so much! This works perfectly.
I'm curious though—how does changing the setting via the REGISTRY procedure make the PDF searchable, regardless of the font I specify? I'd appreciate any additional insight you could share.
Kind regards,
John
@JohnChen_TW wrote:
Hi Kathryn,
Thank you so much! This works perfectly.
I'm curious though—how does changing the setting via the REGISTRY procedure make the PDF searchable, regardless of the font I specify? I'd appreciate any additional insight you could share.
Kind regards,
John
I am curious also. I suspect it is because the SAS code that makes the PDF file creates it with different options depending on the settings in the SAS Registry. But why not just make that a normal option on the ODS PDF statement?
This change was introduced in SAS 9.2 because of font spacing issues in searchable documents vs. non-searchable documents to improve character mapping efficiency. Searchable code leaves character spacing up to Acrobat Reader. Non-searchable code places the characters one at a time based on the character width. Changing the default to searchable would impact everyone running code in a utf8 environment. It could also have negative impacts on output that includes images and graphs.
Which version of SAS are you running? In an earlier version of SAS, I created a file using a Unicode SAS session and got the attached out.pdf file. If you open it and try to search on the word "Alice" you get no matches. If you search on "A l i c e" you do get matches. But the output "appears" as if there are no spaces between the characters. However, when I run the code in SAS 9.4M8 in a Unicode session, I get test_unicode.pdf with no registry changes and the file is searchable.
94 proc options group=languagecontrol;
95 run;
SAS (r) Proprietary Software Release 9.4 TS1M8
Group=LANGUAGECONTROL
DATESTYLE=MDY Specifies the sequence of month, day, and year when ANYDTDTE, ANYDTDTM, or
ANYDTTME informat data is ambiguous.
DFLANG=ENGLISH Specifies the language for international date informats and formats.
DSCAS Runs the DATA step on the CAS server.
EXTENDOBSCOUNTER=YES
Specifies whether to extend the maximum number of observations in a new SAS
data file.
LOCALEDATA=SASLOCALE
Specifies the location of the locale database.
LOGLANGCHG Enables changing the language of the SAS log when the LOCALE= option is
changed.
NOLOGLANGENG Write SAS log messages based on the values of the LOGLANGCHG, LSWLANG=, and
LOCALE= options when SAS started.
LSWLANG=LOCALE Specifies the language for SAS log and ODS messages when the LOCALE= option is
set after SAS starts.
MAPEBCDICTOASCII= Specifies the transcoding table that is used to convert characters from ASCII
to EBCDIC and EBCDIC to ASCII.
NONLDECSEPARATOR Disables formatting of numeric output using the decimal separator for the
locale.
ODSLANGCHG Enables the language of the SAS message text in ODS output to change when the
LOCALE option is set after start up.
PAPERSIZE=LETTER Specifies the paper size to use for printing.
RSASIOTRANSERROR Displays a transcoding error when illegal values are read from a remote
application.
TIMEZONE= Specifies a time zone.
TRANTAB= Specifies the translation table catalog entries.
URLENCODING=SESSION
Specifies whether the argument to the URLENCODE function and to the URLDECODE
function is interpreted using the SAS session encoding or UTF-8 encoding.
DBCS Enables double-byte character sets for encoding East Asian languages.
DBCSLANG=UNKNOWN Specifies a double-byte character set language.
DBCSTYPE=UTF8 Specifies the encoding method to use for a double-byte character set.
ENCODING=UTF-8 Specifies the default character-set encoding for the SAS session.
LOCALE=EN_US Specifies a set of attributes in a SAS session that reflect the language,
local conventions, and culture for a geographical region.
NONLSCOMPATMODE Encodes data using the SAS session encoding.
NOTE: PROCEDURE OPTIONS used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
96
97 /* Show the current setting */
98 proc registry startat="CORE\PRINTING\PDF\DBCS" list;
99 run;
NOTE: Contents of SASHELP REGISTRY starting at subkey [CORE\PRINTING\PDF\DBCS]
[ CORE\PRINTING\PDF\DBCS]
Searchable="No"
This is the code I ran:
ods _all_ close;
ods pdf file='c:\temp\test_unicode.pdf' style=printer;
proc print data=sashelp.class;
run;
ods pdf close;
ods listing;
I do not think it is necessary to change the Registry back to Searchable="No" if setting it to Yes is working in your current environment.
I'm currently using SAS 9.4M7, so based on your example, it seems the "searchable issue" is resolved starting from the version M8, correct?
Just like in your case, when I don't modify the registry setting, the output appears to have no spaces between the characters. However, when I copy the text from the output and paste it into Word, spaces appear between the letters. This prevents me from searching for the word properly in the PDF.
I'm just a bit concerned about potential risks if I leave the registry setting at Searchable = Yes, since you mentioned there might be some negative impacts in your earlier reply.
Additionally, I have a question regarding "fonts". When I set the output font to Courier New, I always get a message saying that the font is not available, and it automatically falls back to Courier. Is there any way to generate output using Courier New while running SAS in a Unicode session? Thank you very much!
That is correct that I am not seeing an issue with searching in SAS 9.4M8. If you always run your code in a Unicode session, I don't see any issues leaving Searchable to "Yes" until you are using a more current version of SAS, where it appears to not be necessary.
For font issues, I would recommend the following blog:
https://blogs.sas.com/content/sgf/2020/03/20/how-to-debug-5-common-sas-software-font-issues/
Thank you for the helpful blog link.
I've registered the fonts that I need and tested it on my end — everything worked as expected.
Really appreciate your support!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.