- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I am getting a warning message of "Some character data was lost during transcoding in column" from the below codes.
I have two questions and any helps or advice are gladly appreciated.
1. Is there any way to fix the warning message?
2. It appears Sale_description is causing this issue but I don't even have this variable in the keep = statement. Is there any way to only read 4 variables from the dataset so we don't see the warning message (which is caused by sale_description)?
Thanks
%let a = /Inventory/group/sale_20200930.xlsx ;
libname sale xlsx "&a";
data inv_sept ;
set sale.sale_sept (keep = sale_amount sale_target sale_location sale_person);
run;
WARNING: Some character data was lost during transcoding in column: Sale_description at obs 332
...........
WARNING: Some character data was lost during transcoding in column: Sale_description at obs 998
NOTE: The import data set has 1284 observations and 25 variables.
NOTE: There were 1284 observations read from the data set SALE.SALE_SEPT
NOTE: The data set WORK.INV_SEPT has 1284 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.60 seconds
cpu time 0.59 seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you aren't using the variable then ignore the warning.
If you need to have a clean "log" you could try making a copy of your source file without that column.
Or try the option Transcode_Fail = Silent on the libname statement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Ballardw, do you mind to show me how to make a copy of my source file without that column?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@LL5 wrote:
Thanks Ballardw, do you mind to show me how to make a copy of my source file without that column?
Make a copy of the file using system tools.
Open the copy.
Highlight the column
Delete the column
Save the file
Point the SAS code to the new file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks again. This approach may not work in my work situation because there would be no human intervention since the the program runs automatically via a scheduled job, but appreciated for your advice.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi LL5,
To make sure you don't lose character data during transcoding, you need to expand your character lengths before transcoding occur. The simplest way of doing it is by using CVP (Character Variable Padding) special Engine which is part of SAS BASE. This technique is described in detail in my recent blog post Expanding lengths of all character variables in SAS data sets.
Hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Leonid, but since I am reading an external excel file by using xlsx libname here, would CVP work in this case? It appears to me that CVP engine applies to sas dataset only. It'd be great if you can share some more insights.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I guess your SAS session has encoding WLATIN1, while your .xlsx file has UTF-8 encoding. You need to change your SAS session's encoding to UTF-8: How to Run a SAS Session in UTF-8 Encoding.
You can find out your SAS session encoding by running this code snippet:
proc options option=encoding;
run;
Hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My SAS session has encoding LATIN1. I tried to change the xlsx libname to proc import as below. I also added the encoding to the filename, but I am still getting the same warning message.
filename a '/Inventory/group/sale_20200930.xlsx’ encoding='LATIN1' ;
PROC IMPORT
datafile = a
out= inv_sept (keep = sale_amount sale_target sale_location sale_person)
dbms=xlsx
replace;
sheet = "sale_sept";
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
LATIN1 is a single byte encoding. So it only has 256 possible characters that can be stored. You need to run SAS using UTF-8 encoding to be sure that you can read any character that might be in the XLSX file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If your .xlxs file has UTF-8 encoding, and it has some UTF-8 characters that take more than 1 byte you may lose characters due to transcoding if your SAS has LATIN1 encoding. As I indicated in my previous reply, you need to bring your SAS encoding to UTF-8, not your UTF-8 encoded .xlxs file to LATIN1. LATIN1 encoding uses 1 byte per character, while UTF-8 uses up to 4 bytes per character. Therefore you need to match your SAS session encoding to your file encoding, not vice versa.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Understood and makes sense. Unfortunately I am not able to change the encoding to 'UTF-8' in my SAS session.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And import this csv again.
filename a '/Inventory/group/sale_20200930.csv’ encoding='utf8' ;
PROC IMPORT
2) Try utf8 encoding:
filename a '/Inventory/group/sale_20200930.xlsx’ encoding='utf8' ;
PROC IMPORT
or
libname a '/Inventory/group/sale_20200930.xlsx’ inencoding='utf8' outencoding='utf8' ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Ksharp. The issue is the excel file I have has utf-8 but my SAS session (E.G) has Latin1. As pointed out by Leonid's post earlier, I would need to change my SAS session to utf-8, so adding any encoding option in the code would not solve the problem. Unfortunately, either changing the file from xlsx to csv or change my SAS configuration file from Latin1 to utf-8 would not be doable in my work situation, I could only leave the warning message as it.
Separately, since I have an external file instead of a sas dataset, I can't use inencoding/outencoding because it is not implemented in the XLSX engine, but good to know this technique. Thanks again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Below is untested, but it is possible to provide an excel range when you specify the dataset on the SET statement.
data inv_sept ;
set sale."sale_sept$A:C"n;
run;
That might keep the XLSX engine from reading the extra columns.
Unfortunately, it looks like it only supports contiguous ranges. Below when I give it a noncontiguous range, $A:C,E:F , I still get column D.
data inv_sept ;
set sale."sale_sept$A:C,E:F"n;
run;
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.