I am trying to save a SAS dataset having Unicode characters using R and IML in the RData format using R's save command, and am getting the following errors:
ERROR: SAS is unable to transcode character data to the R encoding.
The program runs OK if the dataset does not have any Unicode character. I am using SAS with Unicode support to run the program. Any tips to resolve this will be really appreciated.
It may help a lot to show the code that generates the error and indicate which variables hold values that may include Unicode characters. Better is to include the code along with the notes and messages from the log. Copy the log with the code and all messages and paste into a text box opened on the forum with the </> icon above the message window.
Best would be to provide example data along with the code so someone with experience can actually test possible solutions. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Added a minimal example to reproduce the error.
I suspect you are going to have to align your SAS session encoding with R's encoding or vice versa. You can confirm what your SAS session encoding is by running this:
proc options option = encoding;
run;
Please post your SAS log from the above.
Here are two minimal examples, the first one works and the second one does not:
*** WORKS; data tab1; input symbol $; datalines; + ; run; proc iml; run ExportDataSetToR("tab1", "tab1"); submit / R; print(tab1) endsubmit; quit; *** DOES NOT WORK; data tab2; input symbol $; datalines; ≤ ; run; proc iml; run ExportDataSetToR("tab2", "tab2"); submit / R; print(tab2) endsubmit; quit;
I get the following error on running the second piece of code:
NOTE: IML Ready 126 run ExportDataSetToR("tab2", "tab2"); ERROR: SAS is unable to transcode character data to the R encoding. ERROR: Unable to export data. ERROR: Execution error as noted previously. (rc=1000)
This is the encoding for my SAS session:
132 proc options option = encoding; 133 run; SAS (r) Proprietary Software Release 9.4 TS1M6 ENCODING=UTF-8 Specifies the default character-set encoding for the SAS session.
Any help with this issue ? Thanks!
As @SASKiwi already mentioned the likely issue here is that your R session doesn't match the SAS session.
With the SAS session being UTF-8 a likely resolution (to be tested) could be:
export LANG=en_US.UTF-8 set in the appropriate config file .cfg sasenv_local is also possible.
This is something a SAS admin at your site would need to configure.
I do know that above was the resolution for someone who encountered the same error message than you.
I suggest you Google R language encoding and check out the links provided. I did and found a number of helpful links. However I'm not an R user so there is no point in me trying to pass on secondhand information, when you can more easily figure out what might be useful or not.
Alternatively open a track with SAS Tech Support.
I am trying to save a SAS dataset having Unicode characters using R and IML in the RData format using R's save command, and am getting the following errors:
ERROR: SAS is unable to transcode character data to the R encoding.
The program runs OK if the dataset does not have any Unicode character. I am using SAS with Unicode support to run the program. Any tips to resolve this will be really appreciated.
Here is sample code to reproduce the error:
*** WORKS;
data tab1;
input symbol $;
datalines;
+
;
run;
proc iml;
run ExportDataSetToR("tab1", "tab1");
submit / R;
print(tab1)
endsubmit;
quit;
*** DOES NOT WORK;
data tab2;
input symbol $;
datalines;
≤
;
run;
proc iml;
run ExportDataSetToR("tab2", "tab2");
submit / R;
print(tab2)
endsubmit;
quit;
This is the encoding for my SAS session:
132 proc options option = encoding;
133 run;
SAS (r) Proprietary Software Release 9.4 TS1M6
ENCODING=UTF-8 Specifies the default character-set encoding for the SAS session.
Any help will be really appreciated.
What is the encoding from the R session that PROC IML created?
Run this function.
Sys.getlocale()
I am on Windows and get this:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
@sasuser92 wrote:
I am on Windows and get this:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
So you need to scrub your SAS datasets of any characters that do not fit into codepage 1252 (also known as WLATIN1 or LATIN1).
data want;
set have;
*----------------------------------------------------------------------------;
* Convert any UTF-8 character not in LATIN1 codepage to HTML encoded strings ;
*----------------------------------------------------------------------------;
array _character_ _character_;
do over _character_;
do until(_n_=0);
_n_=kverify(_character_,collate(0,127)||kcvt(collate(128,255),'latin1','utf-8'));
if _n_ then _character_=tranwrd(_character_,ksubstr(_character_,_n_,1)
,htmlencode(ksubstr(_character_,_n_,1),'7bit'))
;
end;
end;
run;
Note you might have to make your character variables longer.
You might also need to do something about characters in variable labels.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.