Hi all :
There is a one variable got "±" and I tried using these code, it doesn't work.
------------------------------------
Variable X had multiple records:
a fox is running
a cat has ±1 years to live
an apple
------------------------------
thank you,
purple
/*Get a list of SAS byte for special character*/
data _null_;
do k=1 to 255;
x=byte(k);
put k +10 x;
end;
run;
data need;
set have;
_a=tranwrd(a,byte(177),'Plus or Minus');
run;
the log
ERROR: Some character data was lost during transcoding in the
dataset have. Either the data contains characters that
are not representable in the new encoding or truncation
occurred during transcoding.
hi @Tom @PaigeMiller :
Thank you again for your support, this is really a great place to ask for SAS help.
👍
purple
/*------------------------Migrating Data from WLATIN1 to UTF-8------------------------------------
REFERENCE:
https://documentation.sas.com/doc/en/pgmsascdc/v_006/viyadatamig/p1eedruqfsgqqcn1pmjof4br5xvt.htm
/*------------------------------------------------------------------------------------------------*/
/*===Step1: Find out WHICH Encoding is on Your SAS session
- Mine is: SAS V9.4===*/
proc options option=encoding;
run;
/*====Step2: Use ENCODING= option to transform WLATIN1 TO UTF-8 ===*/
data need;
set have (encoding=wlatin1);
_newVar=pahdx;
if index(compress(_newvar),"1year")>0;
/*Recode ±*/
_new=tranwrd(_newvar,"≥", "±");
keep pahdx _new:;
run;
proc print data=need;
run;
proc print output screen shot from SAS window -Result:
As your message indicates what bytes to store into your variable to make it print a particular glyph depends on what ENCODING you are using.
Normal ASCII codes only cover the characters between a space ('20'x or 32 decimal) and a tilda ('7E'x or 126 decimal).
In general it would be much easier and reproducible to just store the two character string '+-' or perhaps the three character string '+/-' into the variable.
There is a UNICODE character for that, so if your SAS session is using UTF-8 as the encoding you could store it.
https://en.wikipedia.org/wiki/Plus%E2%80%93minus_sign
It should be available in LATIN1 or WLATIN1 encoding also.
https://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)
6 data test; 7 length latin1 $1 utf8 $2; 8 latin1='B1'x; 9 utf8=kcvt(latin1,'latin1','utf-8'); 10 put (_all_) (=:$hex./); 11 run; latin1=B1 utf8=C2B1
As your message indicates what bytes to store into your variable to make it print a particular glyph depends on what ENCODING you are using.
Normal ASCII codes only cover the characters between a space ('20'x or 32 decimal) and a tilda ('7E'x or 126 decimal).
In general it would be much easier and reproducible to just store the two character string '+-' or perhaps the three character string '+/-' into the variable.
There is a UNICODE character for that, so if your SAS session is using UTF-8 as the encoding you could store it.
https://en.wikipedia.org/wiki/Plus%E2%80%93minus_sign
It should be available in LATIN1 or WLATIN1 encoding also.
https://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)
6 data test; 7 length latin1 $1 utf8 $2; 8 latin1='B1'x; 9 utf8=kcvt(latin1,'latin1','utf-8'); 10 put (_all_) (=:$hex./); 11 run; latin1=B1 utf8=C2B1
HI Tom:
thank you for helping.
I am using SAS V9.4 M7, and I don't know which ENCODING i am using.
How may I know that?
If we know which encoding should be used, can u just use this code (with the correct ENCODING) to read in SAS dataset
thanks again.
Purple
Check the system encoding of your current SAS session.
%put %sysfunc(getoption(encoding));
If you are reading from a SAS dataset then check the encoding of the dataset using PROC CONTENTS.
proc contents data=mylib.mydataset ;
run;
How did you start SAS?
Did you type a command at the command line? Do you have a separated command or an option on the command you are using to pick the session encoding?
Did you click on some Windows ICON? Which one? On Windows you should normally have a option to launch SAS with Unicode support.
Are you using from front-end tool to submit your SAS code? Such as Enterprise Guide or SAS/Studio? If so then make sure you connect to an SAS application server that is using UTF-8 encoding.
hi Tom:
here is using the PROC CONTENTS-- UTF-8 Unicode(UTF-8)
if that is the case, I should use UTF
UTF-8 Encoding: | 0xC2 0xB1 |
to read in SAS dataset, is this correct?
thanks again
purple
I'm surprised you get the error that you show us with the code that you show us. Seems impossible.
No HAVE data set is created. No variable named A is created. Please, from now on, show us the EXACT code you are running, and not some close approximation.
This works:
data _null_;
do k=1 to 255;
x=byte(k);
put k +10 x;
end;
run;
data need;
set have;
_a=tranwrd(x,byte(177),'Plus or Minus');
run;
@PaigeMiller and @Tom :
thanks for helping. it works but didn't give the result I need.
I tried to search and tried this code, still got log error message.
data test (encoding=UTF8);
set myfile.test;
run;
You seem to have attached pictures from the SAS Display Manager user interface.
To see if the character is working properly please print it to an ODS destination, like an HTML output and look at it there. SAS has not updated Display Manager to really support extended characters.
You also need to check what encoding your SAS session is using. If it is use LATIN1 or WLATIN1 then you should see the 'C2B1'x character string in the SAS datasets transcoded into the 'B1'x character string while SAS is using the data.
To see what characters are there print the values with the $HEX format so that each character is printed used two hexadecimal digits.
If the problem occurs when you are using the ± symbol, why are you showing us the data set for other characters????
hi @Tom @PaigeMiller :
Thank you again for your support, this is really a great place to ask for SAS help.
👍
purple
/*------------------------Migrating Data from WLATIN1 to UTF-8------------------------------------
REFERENCE:
https://documentation.sas.com/doc/en/pgmsascdc/v_006/viyadatamig/p1eedruqfsgqqcn1pmjof4br5xvt.htm
/*------------------------------------------------------------------------------------------------*/
/*===Step1: Find out WHICH Encoding is on Your SAS session
- Mine is: SAS V9.4===*/
proc options option=encoding;
run;
/*====Step2: Use ENCODING= option to transform WLATIN1 TO UTF-8 ===*/
data need;
set have (encoding=wlatin1);
_newVar=pahdx;
if index(compress(_newvar),"1year")>0;
/*Recode ±*/
_new=tranwrd(_newvar,"≥", "±");
keep pahdx _new:;
run;
proc print data=need;
run;
proc print output screen shot from SAS window -Result:
Do not put literal non-7bit ASCII characters into your CODE. That is just asking for problems.
Use the hex code literals instead.
'B1'x should be the code for that plus minus character.
If your actual file has 3 bytes for the plus minus then perhaps it is not really UTF-8? Or perhaps the creator of the dataset had already had their own troubles with encoding.
hi Tom:
The data came from 3rd party -Medidata. so I don't know what happended to the entry.
I don't understand what does this mean-
"Do not put literal non-7bit ASCII characters into your CODE. That is just asking for problems.
Use the hex code literals instead.
'B1'x should be the code for that plus minus character."
Do I need to code like this in SAS window 10 enhanced editor?
thank you ,
Purple
data need;
set cldata.mh(encoding=wlatin1);
_newVar=pahdx;
if index(compress(_newvar),"1year")>0;
/*Recode ±*/
/*_new=tranwrd(_newvar,"≥", "±");*/
/*???hexcode???- Is this part you are talking about*/
_new=tranwrd(_newvar,"B1x", "±");
keep pahdx _new:;
run;
No.
Instead of typing/copying the characters into your code replace it with CODE that will create the character.
So instead of typing
'±'
use
'B1'x
They will both create the same thing when you are using LATIN1 encoding.
But if you ran the first one in a SAS session using UTF-8 encoding you would instead get the two byte string 'C2B1'x instead.
So instead of
new=tranwrd(_newvar,"≥", "±");
use
new=transwrd(_newvar,'E289A5'x,'B1'x);
Now your code file does not include those strange characters and so you will be able to work with it more easily and avoid confusion.
For example the string you are converting 'E289A5'x is the how to represent the greater than or equal to symbol and not the plus or minus symbol.
Which does not exist in the LATIN1 encoding.
So your code should be:
new=transwrd(_newvar,'E289A5'x,'>=');
You still did not say what encoding your SAS session is using, just which version of SAS you were using.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.