NOTE: SAS (r) Proprietary Software 9.4 (TS1M6)
NOTE: This session is executing on the Linux 3.10.0-1127.10.1.el7.x86_64 (LIN X64) platform.
I'm using XLSX libname engine to read an XL with smart quotes and other text that is not transcoding.
WARNING: Some character data was lost during transcoding in column: QueryText at obs 190.
What can I do to import this data without data loss? I do not understand transcoding.
Is there anyt
> What can I do to import this data without data loss? I do not understand transcoding.
You need a SAS session that can understand and store these characters. If your SAS session has no means of recognising some characters, it will replace them. UTF-8 is the de-facto standard nowadays. Run a SAS session using UTF-8 encoding, store the data in UTF-8-encoded data sets and all will be saved.
Of course, you now have different issues: 1) These data sets can only be fully read by sessions that also understand these characters, and 2) you now have multi-bytes characters in your strings.
This might be a good read about smart quotes: https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4561-2020.pdf
and this too https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/nlsref/n15e31tqdv020en1fok7tp4l9zd5.htm
Hi @data_null__
You did not indicate which encoding your SAS session was started with and you also did not post your SAS code.
I assume your session encoding is some ASCII encoding like LATIN1 or similar. If so, try starting SAS with encoding=UTF8 just for testing if it resolves your issue. Be aware that switching to UTF8 encoding may have impact on your existing SAS datasets and existing SAS code so use it with extreme caution.
Maybe others will have better ideas..
Eyal
In general if you are having transcoding issues it is best to run your SAS session using "unicode" support. That is using encoding=utf-8. In your SAS sessions is using a single byte encoding, like LATIN1 (or WLATIN1) it will only be able to represent 256 different characters and if the source file has a character that does not exist in your current encoding it will have no place to store it.
But in theory you could still have issues if the source file was using a single byte encoding and included a byte that is one of the bytes in UTF-8 that signal start of a multiple byte sequence. If you tell SAS to treat that file as if it was using UTf-8 encoding then you could have a two or three byte sequence that is meaningless in UTF-8. So you might still need to know the encoding that that source file was using.
But I am not sure if XLSX files internally have some standard they use when storing text to make it clearer if a single byte encoding is being used or to specify what encoding was used when creating it.
> What can I do to import this data without data loss? I do not understand transcoding.
You need a SAS session that can understand and store these characters. If your SAS session has no means of recognising some characters, it will replace them. UTF-8 is the de-facto standard nowadays. Run a SAS session using UTF-8 encoding, store the data in UTF-8-encoded data sets and all will be saved.
Of course, you now have different issues: 1) These data sets can only be fully read by sessions that also understand these characters, and 2) you now have multi-bytes characters in your strings.
This might be a good read about smart quotes: https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4561-2020.pdf
and this too https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/nlsref/n15e31tqdv020en1fok7tp4l9zd5.htm
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.