BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tom
Super User Tom
Super User

Good.

You keep replying with the version of SAS

11   %put &=sysvlong;
SYSVLONG=9.04.01M5P091317

Instead of supplying the request information, the ENCODING that is being used by the SAS session.

12   %put %sysfunc(getoption(encoding));
WLATIN1
purpleclothlady
Pyrite | Level 9

hi all:

I posted the solution too early, thank you @Tom @PaigeMiller who helped me go through this.

please use this as the FINAL solution.

_new=tranwrd(_newvar, 'E289A5'x, 'B1'x);

 

Thank you,👍

purple

 

/*------------------------Migrating Data from WLATIN1 to UTF-8------------------------------------
                                   REFERENCE:
 https://documentation.sas.com/doc/en/pgmsascdc/v_006/viyadatamig/p1eedruqfsgqqcn1pmjof4br5xvt.htm

/*------------------------------------------------------------------------------------------------*/
/*===Step1: Find out WHICH Encoding is on Your SAS session
    - Mine is: SAS V9.4===*/
proc options option=encoding;
run; 
/*====Step2: Use ENCODING= option to transform WLATIN1 TO UTF-8 ===*/
 data need; 
    set have(encoding=wlatin1);
   _newVar=pahdx;
  if index(compress(_newvar),"1year")>0;
	/*Recode ±*/
	/*per TOM suggestion, but not sure why it is working?  
	 https://op.europa.eu/en/web/eu-vocabularies/formex/physical-specifications/character-encoding/mathem...
*/
_new=tranwrd(_newvar, 'E289A5'x, 'B1'x);  
run; 
 

 

Tom
Super User Tom
Super User

So that is still translating the greater than or equal to symbol into the plus minus symbol.

You probably do not want to do that.

 

If you are running a SAS session using WLATIN1 encoding and read a dataset that was created using the UTF-8 encoding then it would properly transcode the plus minus symbol, but it would have trouble with the actual character you seem to have, which is the greater than or equal symbol.  That is because the plus minus symbol exists in the set of 256 characters that WLATIN1 supports, but the greater than or equal symbol does not.

 

So to do your own transcoding you need to force SAS to not automatically transcode the strings in the UTF-8 file.  You can do that by setting the ENCODING= dataset option on the input dataset.  You could set it to the encoding your current session is using, or just set it to ANY instead.  Now the bytes in the files will by copied into the character variables without ANY changes.

 

You will then need generate your own code to transcode ALL of the characters that use multiple bytes in UTF-8.

So if you want to transcode the plus minus symbol you could use

transwrd(_newvar,'C281'x,'81'x)

But for the greater than or equal symbol you will need to replace it with two symbols since in WLATIN1 there is not on character that looks like that.

tranwrd(_newvar, 'E289A5'x, '>=');

And remember that there are potentially many other UTF-8 characters that you are now responsible for transcoding.

 

So it would be much easier to just run SAS using UTF-8 encoding to deal with that dataset and then all of the characters in the file will be displayed properly without you having to do anything.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 2706 views
  • 4 likes
  • 3 in conversation