BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
data_null__
Jade | Level 19

 

NOTE: SAS (r) Proprietary Software 9.4 (TS1M6) 
NOTE: This session is executing on the Linux 3.10.0-1127.10.1.el7.x86_64 (LIN X64) platform.

I'm using XLSX libname engine to read an XL with smart quotes and other text that is not transcoding.

 

WARNING: Some character data was lost during transcoding in column: QueryText at obs 190.

What can I do to import this data without data loss?  I do not understand transcoding.

 

 

Is there anyt

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

> What can I do to import this data without data loss?  I do not understand transcoding.

You need a SAS session that can understand and store these characters. If your SAS session has no means of recognising some characters, it will replace them. UTF-8 is the de-facto standard nowadays. Run a SAS session using UTF-8 encoding, store the data in UTF-8-encoded data sets and all will be saved.

Of course, you now have different issues: 1) These data sets can only be fully read by sessions that also understand these characters, and 2) you now have multi-bytes characters in your strings. 

This might be a good read about smart quotes: https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4561-2020.pdf

and this too https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/nlsref/n15e31tqdv020en1fok7tp4l9zd5.htm

 

View solution in original post

4 REPLIES 4
miaeyg
Fluorite | Level 6

Hi @data_null__ 

 

You did not indicate which encoding your SAS session was started with and you also did not post your SAS code.

 

I assume your session encoding is some ASCII encoding like LATIN1 or similar. If so, try starting SAS with encoding=UTF8 just for testing if it resolves your issue. Be aware that switching to UTF8 encoding may have impact on your existing SAS datasets and existing SAS code so use it with extreme caution. 

 

Maybe others will have better ideas..

 

Eyal

 

Tom
Super User Tom
Super User

In general if you are having transcoding issues it is best to run your SAS session using "unicode" support.  That is using encoding=utf-8.  In your SAS sessions is using a single byte encoding, like LATIN1 (or WLATIN1) it will only be able to represent 256 different characters and if the source file has a character that does not exist in your current encoding it will have no place to store it.

 

But in theory you could still have issues if the source file was using a single byte encoding and included a byte that is one of the bytes in UTF-8 that signal start of a multiple byte sequence.   If you tell SAS to treat that file as if it was using UTf-8 encoding then you could have a two or three byte sequence that is meaningless in UTF-8.  So you might still need to know the encoding that that source file was using.

 

But I am not sure if XLSX files internally have some standard they use when storing text to make it clearer if a single byte encoding is being used or to specify what encoding was used when creating it.

Ksharp
Super User
Can you post your code ?
As Tom pointed out. The encoding of SAS and Excel file is different . Make them have same encoding .I think you know the right way to do.

Alternative way is saving your excel as a CSV file and import it again into sas.

filename x 'c:\temp\have.csv' encoding='utf8';
proc import datafile=x out=have ........
ChrisNZ
Tourmaline | Level 20

> What can I do to import this data without data loss?  I do not understand transcoding.

You need a SAS session that can understand and store these characters. If your SAS session has no means of recognising some characters, it will replace them. UTF-8 is the de-facto standard nowadays. Run a SAS session using UTF-8 encoding, store the data in UTF-8-encoded data sets and all will be saved.

Of course, you now have different issues: 1) These data sets can only be fully read by sessions that also understand these characters, and 2) you now have multi-bytes characters in your strings. 

This might be a good read about smart quotes: https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2020/4561-2020.pdf

and this too https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/nlsref/n15e31tqdv020en1fok7tp4l9zd5.htm

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 3561 views
  • 4 likes
  • 5 in conversation