<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: UTF8 encoding issue : replace the � character in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696426#M212716</link>
    <description>&lt;P&gt;Hi Tom,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your answer.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;1) My current SAS session is UTF-8 encoded.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2) All datasets are stored in production libraries which use&amp;nbsp;utf-8 Unicode (UTF-8) encoding.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From what I understood the issue occures at the encoding level :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;S ENTR�E DE GAMME /* what is displayed in SAS */ &lt;BR /&gt;0012280: 5320 454e 5452 4545 2044 4520 4741 4d4d S ENTREE DE GAMM /* obtained with MobaXterm $ xxd table.sas7bdat | more */&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 04 Nov 2020 10:02:36 GMT</pubDate>
    <dc:creator>cbo</dc:creator>
    <dc:date>2020-11-04T10:02:36Z</dc:date>
    <item>
      <title>UTF8 encoding issue : replace the � character</title>
      <link>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696260#M212632</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have issues dealing with ANSI to UTF8 encoding mishaps in our migration from SAS 9.2 to 9.4 .&lt;BR /&gt;Indeed the new encoding of data generates the special character � instead of the french punctuation (é, è, ê, ë, ...).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Questions on this site mention using `prxchange` or `&lt;SPAN&gt;tranwrd&lt;/SPAN&gt;` function to fix this problem. While this work with regular encoding it appears to not work with the&amp;nbsp;�&amp;nbsp;character when sourced from the production environnement (into the work library). Can someone please advise me how to fix this ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is a reproducible example that does work :&lt;/P&gt;&lt;P&gt;```&lt;BR /&gt;&lt;SPAN&gt;data df_mishap2 ; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; input var $40. ; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; datalines; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;abcdef abcd�a ab abcde&amp;nbsp;&lt;BR /&gt;XXXXXXXX&amp;nbsp;&lt;BR /&gt;OOOOOO abcdefsdkgtre ; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;run ; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;data df_clean2 ; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; set df_mishap2 ; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; var2 = prxchange("s/�/e/i", -1, var) ; /* ok */ &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; var3 = tranwrd(var, "�", "e") ; /* ok */ &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;run;&lt;/SPAN&gt;&lt;BR /&gt;```&lt;BR /&gt;&lt;BR /&gt;But weirdly when applied to a sample dataset sourced from production into the work library it fails :&lt;BR /&gt;```&lt;BR /&gt;data work.test ;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; set libprod.table (keep = var obs = 100);&lt;BR /&gt;&amp;nbsp; &amp;nbsp; var2 = var;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; var3 = var;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; var2 = prxchange("s/�/e/i", -1, var2) ;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; var3 = TRANWRD(var3, "�", "e") ;&lt;BR /&gt;run;&lt;BR /&gt;```&lt;/P&gt;</description>
      <pubDate>Tue, 03 Nov 2020 16:43:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696260#M212632</guid>
      <dc:creator>cbo</dc:creator>
      <dc:date>2020-11-03T16:43:33Z</dc:date>
    </item>
    <item>
      <title>Re: UTF8 encoding issue : replace the � character</title>
      <link>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696266#M212634</link>
      <description>&lt;P&gt;What encoding is your current SAS session using? Check the value of the ENCODING system option.&lt;/P&gt;
&lt;P&gt;Are you reading from SAS datasets?&amp;nbsp; If so what encoding is the dataset using?&amp;nbsp; Check the PROC CONTENTS output.&lt;/P&gt;
&lt;P&gt;Or text files? If so what encoding is the text file using? Does it have a BOM? What encoding did you tell SAS to use when reading it?&lt;/P&gt;
&lt;P&gt;Or using in-line data in your programs, like in your example?&lt;/P&gt;</description>
      <pubDate>Tue, 03 Nov 2020 16:49:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696266#M212634</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2020-11-03T16:49:53Z</dc:date>
    </item>
    <item>
      <title>Re: UTF8 encoding issue : replace the � character</title>
      <link>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696426#M212716</link>
      <description>&lt;P&gt;Hi Tom,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your answer.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;1) My current SAS session is UTF-8 encoded.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2) All datasets are stored in production libraries which use&amp;nbsp;utf-8 Unicode (UTF-8) encoding.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;From what I understood the issue occures at the encoding level :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;S ENTR�E DE GAMME /* what is displayed in SAS */ &lt;BR /&gt;0012280: 5320 454e 5452 4545 2044 4520 4741 4d4d S ENTREE DE GAMM /* obtained with MobaXterm $ xxd table.sas7bdat | more */&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Nov 2020 10:02:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696426#M212716</guid>
      <dc:creator>cbo</dc:creator>
      <dc:date>2020-11-04T10:02:36Z</dc:date>
    </item>
    <item>
      <title>Re: UTF8 encoding issue : replace the � character</title>
      <link>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696509#M212760</link>
      <description>&lt;P&gt;So the data is wrong in the dataset then.&amp;nbsp; You should probably open a support ticket with SAS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That diamond question mark character is what SAS replaces characters that cannot be transcoded with.&amp;nbsp; The question is whether the transcoding error occurred when the dataset was created, or when you are trying to read it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try reading the existing dataset with ENCODING=ANY and see if you can tell what is in that location by using the $HEX format to see what codes are actually stored.&lt;/P&gt;</description>
      <pubDate>Wed, 04 Nov 2020 13:52:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696509#M212760</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2020-11-04T13:52:48Z</dc:date>
    </item>
    <item>
      <title>Re: UTF8 encoding issue : replace the � character</title>
      <link>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696603#M212802</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;P&gt;&lt;SPAN&gt;You should probably open a support ticket with SAS.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;SPAN&gt;It has been done, my supervisors are not too happy about their answer.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;You are spot on with the use of the&amp;nbsp;&lt;STRONG&gt;$HEX&lt;/STRONG&gt; format :&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data df_mishap4 ;
	input var $hex64. ;
	datalines ;
5320454E5452E9452044452047414D4D452020202020202020202020 /* for the sake of the argument I have putted the hexadecimal encoding to reproduce the error */
;;;
run;
/* If you look up the data � appears so you need to check what character causes this */&lt;BR /&gt;
data test;
	set df_mishap4;

	if _n_=1 then
		do;
			put var $32.;
			put var $hex64.;
		end;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;which gives in the log :&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;S ENTR�E DE GAMME             
 &lt;CODE class=" language-sas"&gt;5320 454E 5452 E945 2044452047414D4D452020202020202020202020&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;That way I could find that the problem here has ut8 unicode &lt;STRONG&gt;"E9"&lt;/STRONG&gt; (which correspond to the symbol&amp;nbsp;� instead of&amp;nbsp;&lt;STRONG&gt;é &lt;/STRONG&gt;).&lt;BR /&gt;Then I could apply the correction to change the encoding (with&amp;nbsp;&lt;STRONG&gt;\xe9&lt;/STRONG&gt;) to&amp;nbsp;&lt;STRONG&gt;E&lt;/STRONG&gt; :&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data clean;
	set df_mishap4;
	var2 = prxchange("s/\xe9/E/i", -1, var) ;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;Now I just wish there was an automatic way to fix all variables of a table to a set of transformations (all the other special character).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you for your help !&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Nov 2020 17:38:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696603#M212802</guid>
      <dc:creator>cbo</dc:creator>
      <dc:date>2020-11-04T17:38:02Z</dc:date>
    </item>
    <item>
      <title>Re: UTF8 encoding issue : replace the � character</title>
      <link>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696617#M212808</link>
      <description>&lt;P&gt;That byte is not a valid UTF-8 code.&amp;nbsp; Looks like a LATIN1 code.&lt;/P&gt;
&lt;P&gt;Try using the KCVT function to convert the strings to UTF-8 codes.&amp;nbsp; Make sure they are long enough as some characters require up to 4 bytes to be represented in UTF-8.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
  length have want $4 ;
  have = 'E9'x ;
  want = kcvt(have,'latin1','utf8');
  put (2*have 2*want) (+1 = $4. +1 $hex.) ;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Results in a UTF-8 session:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="image.png" style="width: 615px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/51373iBD7E9CBD97481E6C/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Results in a LATIN1 session.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="image.png" style="width: 499px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/51374iCE0E1E749624F195/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Nov 2020 17:57:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696617#M212808</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2020-11-04T17:57:18Z</dc:date>
    </item>
    <item>
      <title>Re: UTF8 encoding issue : replace the � character</title>
      <link>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696875#M212941</link>
      <description>Thank you for this handy advice ! That is the closest to the solution that built in function will get us it seems.</description>
      <pubDate>Thu, 05 Nov 2020 13:54:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/UTF8-encoding-issue-replace-the-character/m-p/696875#M212941</guid>
      <dc:creator>cbo</dc:creator>
      <dc:date>2020-11-05T13:54:35Z</dc:date>
    </item>
  </channel>
</rss>

