<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Read text fixed width text file with unicode in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Read-text-fixed-width-text-file-with-unicode/m-p/663791#M198226</link>
    <description>&lt;P&gt;Thank you very much so I don't spend time on this.&lt;/P&gt;&lt;P&gt;Will explore _INFILE_ to get the file content in.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Have a great weekend!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;George&lt;/P&gt;</description>
    <pubDate>Sun, 21 Jun 2020 03:40:43 GMT</pubDate>
    <dc:creator>georgemeng</dc:creator>
    <dc:date>2020-06-21T03:40:43Z</dc:date>
    <item>
      <title>Read text fixed width text file with unicode</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-text-fixed-width-text-file-with-unicode/m-p/663761#M198211</link>
      <description>&lt;P&gt;When reading in a UTF-8 file with Unicode in it as shown below:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="georgemeng_0-1592684876247.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/46419i0842BC398864CC79/image-size/medium?v=v2&amp;amp;px=400" role="button" title="georgemeng_0-1592684876247.png" alt="georgemeng_0-1592684876247.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;It took me sometime to find out that the unicode is 3 bytes? So in order to read the file correctly, I have to update the file to:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="georgemeng_2-1592685033888.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/46421iF173537853BAF002/image-size/medium?v=v2&amp;amp;px=400" role="button" title="georgemeng_2-1592685033888.png" alt="georgemeng_2-1592685033888.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;This is OK for small file, but for a big file with a lot of unicode characters, it is just not&amp;nbsp;practical.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My question is when you have unicode characters in the file, is there a way to let SAS process the unicode just as normal ascii characters? I mean without have to manually update the text file with the unicode = 3 bytes in mind?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have attached the code and text file, please let me know your thoughts.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;George&lt;/P&gt;</description>
      <pubDate>Sat, 20 Jun 2020 20:38:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-text-fixed-width-text-file-with-unicode/m-p/663761#M198211</guid>
      <dc:creator>georgemeng</dc:creator>
      <dc:date>2020-06-20T20:38:53Z</dc:date>
    </item>
    <item>
      <title>Re: Read text fixed width text file with unicode</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-text-fixed-width-text-file-with-unicode/m-p/663772#M198214</link>
      <description>&lt;P&gt;Add the following to your INFILE statement:&amp;nbsp;&lt;CODE class=" language-sas"&gt;encoding='utf-8'&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;This lets me read your "notworking" text file.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options pagesize=60 linesize=80 pageno=1 nodate;

data attractions;
	infile '~/test/tour13 - Notworking.txt' truncover encoding='utf-8';
	input City $ 1-9 Museums 11 Galleries 13 Other 15 TourGuide $ 17-25 YearsExperience 26;
run;

proc print data=attractions;
	title 'Data Set MYLIB.ATTRACTIONS';
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The next part is if your SAS session runs single byte or multi byte. Execute below:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc options option=encoding;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;In my environment the session is single byte and not all UTF-8 encoded characters can get mapped to a single byte. This leads for me to a Warning in the SAS log:&lt;/P&gt;
&lt;PRE&gt;WARNING: A character that could not be transcoded has been replaced in record 1.&lt;/PRE&gt;
&lt;P&gt;And to a garbled character:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture.JPG" style="width: 488px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/46422iD2A40FEC22C63D86/image-size/large?v=v2&amp;amp;px=999" role="button" title="Capture.JPG" alt="Capture.JPG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Might not be a problem for you if your SAS session is multibyte (i.e. UTF-8).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 21 Jun 2020 00:29:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-text-fixed-width-text-file-with-unicode/m-p/663772#M198214</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2020-06-21T00:29:27Z</dc:date>
    </item>
    <item>
      <title>Re: Read text fixed width text file with unicode</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-text-fixed-width-text-file-with-unicode/m-p/663780#M198218</link>
      <description>&lt;P&gt;Thank you very much for spending the time over the weekend!&lt;/P&gt;&lt;P&gt;In my environment, I added encoding="utf-8", also checked the session option, the session encoding is also UTF-8, but YearsExperience is still missing for first record.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Your result shows correctly the number 2, is it possible that is because when the three byte unicode character was replace by a single byte "?" ?&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;A character that could not be transcoded has been replaced in record 1&lt;/PRE&gt;&lt;P&gt;Just wonder why it worked in your environment not mine.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am really new to SAS, so please bear with me.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My envrionment:&lt;/P&gt;&lt;P&gt;SAS University Editor on Windows 10 through VirtualBox.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks again!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;George&lt;/P&gt;</description>
      <pubDate>Sun, 21 Jun 2020 01:30:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-text-fixed-width-text-file-with-unicode/m-p/663780#M198218</guid>
      <dc:creator>georgemeng</dc:creator>
      <dc:date>2020-06-21T01:30:40Z</dc:date>
    </item>
    <item>
      <title>Re: Read text fixed width text file with unicode</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-text-fixed-width-text-file-with-unicode/m-p/663781#M198219</link>
      <description>&lt;P&gt;You cannot do it with the INFILE/INPUT statements.&amp;nbsp; Instead use the KSUBSTR() function to extract the number of CHARACTERS you want from the automatic _INFILE_ variable.&amp;nbsp; Make sure your lines are not longer than 32K bytes.&lt;/P&gt;</description>
      <pubDate>Sun, 21 Jun 2020 01:41:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-text-fixed-width-text-file-with-unicode/m-p/663781#M198219</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2020-06-21T01:41:55Z</dc:date>
    </item>
    <item>
      <title>Re: Read text fixed width text file with unicode</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Read-text-fixed-width-text-file-with-unicode/m-p/663791#M198226</link>
      <description>&lt;P&gt;Thank you very much so I don't spend time on this.&lt;/P&gt;&lt;P&gt;Will explore _INFILE_ to get the file content in.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Have a great weekend!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;George&lt;/P&gt;</description>
      <pubDate>Sun, 21 Jun 2020 03:40:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Read-text-fixed-width-text-file-with-unicode/m-p/663791#M198226</guid>
      <dc:creator>georgemeng</dc:creator>
      <dc:date>2020-06-21T03:40:43Z</dc:date>
    </item>
  </channel>
</rss>

