BookmarkSubscribeRSS Feed
LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Hi, I am getting a warning message of "Some character data was lost during transcoding in column" from the below codes.

I have two questions and any helps or advice are gladly appreciated.

1. Is there any way to fix the warning message?

2. It appears Sale_description is causing this issue but I don't even have this variable in the keep = statement. Is there any way to only read 4 variables from the dataset so we don't see the warning message (which is caused by sale_description)?

 

Thanks

 

%let a = /Inventory/group/sale_20200930.xlsx ;
libname sale xlsx "&a";


data inv_sept ; 
set sale.sale_sept (keep = sale_amount sale_target sale_location sale_person); 
run;

WARNING: Some character data was lost during transcoding in column: Sale_description at obs 332

 

...........

WARNING: Some character data was lost during transcoding in column: Sale_description at obs 998
NOTE: The import data set has 1284 observations and 25 variables.
NOTE: There were 1284 observations read from the data set SALE.SALE_SEPT
NOTE: The data set WORK.INV_SEPT has 1284 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.60 seconds
cpu time 0.59 seconds

 

20 REPLIES 20
ballardw
Super User

If you aren't using the variable then ignore the warning.

 

If you need to have a clean "log" you could try making a copy of your source file without that column.

Or try the option Transcode_Fail = Silent on the libname statement.

LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Thanks Ballardw, do you mind to show me how to make a copy of my source file without that column? 

ballardw
Super User

@LL5 wrote:

Thanks Ballardw, do you mind to show me how to make a copy of my source file without that column? 


Make a copy of the file using system tools.

Open the copy.

Highlight the column

Delete the column

Save the file

Point the SAS code to the new file.

LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Thanks again. This approach may not work in my work situation because there would be no human intervention since the the program runs automatically via a scheduled job, but appreciated for your advice. 

LeonidBatkhan
Lapis Lazuli | Level 10

Hi LL5,

 

To make sure you don't lose character data during transcoding, you need to expand your character lengths before transcoding occur. The simplest way of doing it is by using CVP (Character Variable Padding) special Engine which is part of SAS BASE. This technique is described in detail in my recent blog post Expanding lengths of all character variables in SAS data sets.

 

Hope this helps.

LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Thanks Leonid, but since I am reading an external excel file by using xlsx libname here, would CVP work in this case? It appears to me that CVP engine applies to sas dataset only. It'd be great if you can share some more insights. 

LeonidBatkhan
Lapis Lazuli | Level 10

I guess your SAS session has encoding WLATIN1, while your .xlsx file has UTF-8 encoding. You need to change your SAS session's encoding to UTF-8: How to Run a SAS Session in UTF-8 Encoding.

 

You can find out your SAS session encoding by running this code snippet:

proc options option=encoding;
run;

Hope this helps.

 

LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

My SAS session has encoding LATIN1. I tried to change the xlsx libname to proc import as below. I also added the encoding to the filename, but I am still getting the same warning message. 

 

filename a '/Inventory/group/sale_20200930.xlsx’ encoding='LATIN1' ;
PROC IMPORT
datafile = a
out= inv_sept  (keep = sale_amount sale_target sale_location sale_person)
dbms=xlsx
replace;
sheet = "sale_sept";
RUN;
Tom
Super User Tom
Super User

LATIN1 is a single byte encoding. So it only has 256 possible characters that can be stored. You need to run SAS using UTF-8 encoding to be sure that you can read any character that might be in the XLSX file.

LeonidBatkhan
Lapis Lazuli | Level 10

If your .xlxs file has UTF-8 encoding, and it has some UTF-8 characters that take more than 1 byte you may lose characters due to transcoding if your SAS has LATIN1 encoding. As I indicated in my previous reply, you need to bring your SAS encoding to UTF-8, not your UTF-8 encoded .xlxs file to LATIN1. LATIN1 encoding uses 1 byte per character, while UTF-8 uses up to 4 bytes per character. Therefore you need to match your SAS session encoding to your file encoding, not vice versa.

LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Understood and makes sense. Unfortunately I am not able to change the encoding to 'UTF-8' in my SAS session. 

Ksharp
Super User
1) Transform your excel file into CSV file by Save As menu.
And import this csv again.

filename a '/Inventory/group/sale_20200930.csv’ encoding='utf8' ;
PROC IMPORT


2) Try utf8 encoding:
filename a '/Inventory/group/sale_20200930.xlsx’ encoding='utf8' ;
PROC IMPORT

or
libname a '/Inventory/group/sale_20200930.xlsx’ inencoding='utf8' outencoding='utf8' ;
LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

Thanks Ksharp. The issue is the excel file I have has utf-8 but my SAS session (E.G) has Latin1. As pointed out by Leonid's post earlier, I would need to change my SAS session to utf-8, so adding any encoding option in the code would not solve the problem. Unfortunately, either changing the file from xlsx to csv or change my SAS configuration file from Latin1 to utf-8 would not be doable in my work situation, I could only leave the warning message as it. 

 

Separately, since I have an external file instead of a sas dataset, I can't use  inencoding/outencoding because it is not implemented in the XLSX engine, but good to know this technique. Thanks again.

Quentin
Super User

Below is untested, but it is possible to provide an excel range when you specify the dataset on the SET statement. 

 

data inv_sept ; 
  set sale."sale_sept$A:C"n; 
run;

That might keep the XLSX engine from reading the extra columns.

 

Unfortunately, it looks like it only supports contiguous ranges. Below when I give it a noncontiguous range, $A:C,E:F , I still get column D.

 

data inv_sept ; 
set sale."sale_sept$A:C,E:F"n; 
run;
The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 17302 views
  • 3 likes
  • 6 in conversation