<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: reading a csv file with embedded 'â€™' in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/659152#M197520</link>
    <description>&lt;P&gt;1. You are reading an Unicode UTF-8 file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. &amp;nbsp;The test you are attempting to make will always fail since variable A has a length of 1.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3.The best&amp;nbsp;option if you want to create an ASCII data set is translating the multi-byte characters into ASCII characters as they are loaded (or in your case, after the proc import), using a logic similar to the one you are attempting now.&amp;nbsp; Hoping there's no Chinese or mathematical characters or 20 different quote types in the file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Additional notes:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;4.&amp;nbsp;You could use option &lt;FONT face="courier new,courier"&gt;encoding=&lt;/FONT&gt;&amp;nbsp;to pre-process the file properly, but not when using option&amp;nbsp;&lt;FONT face="courier new,courier"&gt;recfm=n &lt;/FONT&gt;because N means that only one byte is read at a a time.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;5. To conserve the data as is when reading the fixed file, you can use&lt;/P&gt;
&lt;PRE class="xisDoc-codeFragment"&gt;&lt;CODE&gt;filename extfile '&lt;EM class="xisDoc-userSuppliedValue"&gt;external-file&lt;/EM&gt;' encoding="utf-8"; &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;to read the data correctly. You should probably use the same option for your data set, as your SAS session is probably &lt;EM&gt;wlatin1&lt;/EM&gt;. Check this. You still won't be able to display the data properly though, since &lt;EM&gt;wlatin1&lt;/EM&gt; does not allow multi-byte characters.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;6. For what it's worth, UTF-8 is here to stay, so my opinion is: your organisation should start using UTF-8 environments and move away from ASCII-derived ones.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;7. So now you have 3 steps: pre-clean, proc import, post-clean instead of one data step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 15 Jun 2020 22:26:45 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2020-06-15T22:26:45Z</dc:date>
    <item>
      <title>reading a csv file with embedded 'â€™'</title>
      <link>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/658651#M197376</link>
      <description>&lt;PRE&gt;&lt;SPAN&gt;Hi&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159" target="_blank" rel="noopener"&gt;@Tom&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879" target="_blank" rel="noopener"&gt;@Reeza&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;, &lt;BR /&gt;I have the attached file in .csv ..&lt;BR /&gt;I have some embedded carriage return Line Feed in the CSV so I am using the above code . &lt;BR /&gt;Then I noticed my " ' "&amp;nbsp; &amp;nbsp;in my input file are&amp;nbsp;&lt;/SPAN&gt;&lt;FONT color="#FF0000"&gt;'â€™' .&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT color="#000000"&gt;How can I correct that initially ?&lt;BR /&gt;&lt;BR /&gt;or can i  correct it after the data has een read in proc import in another data step.&lt;BR /&gt;&lt;BR /&gt;Appreciate any assistance.&lt;BR /&gt;&lt;/FONT&gt;&lt;/FONT&gt;The &lt;FONT color="#FF0000"&gt;highlighted colour&lt;/FONT&gt; is something I tried but no luck&lt;BR /&gt;&lt;BR /&gt;the new test file name is&lt;FONT color="#FF00FF"&gt; _test_1&lt;/FONT&gt; &lt;BR /&gt;*****************************************--------------------------*********************;&lt;BR /&gt;&lt;BR /&gt;%let repA=' '; /* replacement character LF */
%let repD=' '; /* replacement character CR */
&lt;FONT color="#FF0000"&gt;%let repE="'";&lt;/FONT&gt;
%let dsnnme="*.csv"; /* use full path of CSV file */

data _null_;
      /* RECFM=N reads the file in binary format. The file consists */
      /* of a stream of bytes with no record boundaries. SHAREBUFFERS */
      /* specifies that the FILE statement and the INFILE statement */
      /* share the same buffer. */
      infile &amp;amp;dsnnme recfm=n sharebuffers;
      file &amp;amp;dsnnme recfm=n;

      /* OPEN is a flag variable used to determine if the CR/LF is within */
      /* double quotes or not. Retain this value. */
      retain open 0;
      input a $char1.;

      /* If the character is a double quote, set OPEN to its opposite value. */
      if a = '"' then
           open = ^(open);

      /* If the CR or LF is after an open double quote, replace the byte with */
      /* the appropriate value. */
      if open then
           do;
                 if a = '0D'x then
                      put &amp;amp;repD;
                 else if a = '0A'x then
                      put &amp;amp;repA;
				&lt;FONT color="#FF0000"&gt; else if a='â€™' then 
				      put &amp;amp;repE;&lt;/FONT&gt;
           end;
run;


/*STEP 2 ; */

proc import
      /* CSV */
      datafile=".csv" 
      out=test  dbms=csv replace;
      delimiter=',';
      guessingrows=32767;
run;&lt;/PRE&gt;
&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;&amp;nbsp;, I have the attached file in .csv .. I have some embedded carriage return Line Feed in the CSV so I am using the above code . Then I noticed my " ' "&amp;nbsp; &amp;nbsp;in my input file are&amp;nbsp;&lt;FONT color="#FF0000"&gt;'â€™' . &lt;FONT color="#000000"&gt;How can I correct that.&amp;nbsp; Appreciate any assistance.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jun 2020 13:00:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/658651#M197376</guid>
      <dc:creator>dennis_oz</dc:creator>
      <dc:date>2020-06-15T13:00:25Z</dc:date>
    </item>
    <item>
      <title>Re: reading a csv file with embedded 'â€™'</title>
      <link>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/659152#M197520</link>
      <description>&lt;P&gt;1. You are reading an Unicode UTF-8 file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. &amp;nbsp;The test you are attempting to make will always fail since variable A has a length of 1.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3.The best&amp;nbsp;option if you want to create an ASCII data set is translating the multi-byte characters into ASCII characters as they are loaded (or in your case, after the proc import), using a logic similar to the one you are attempting now.&amp;nbsp; Hoping there's no Chinese or mathematical characters or 20 different quote types in the file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Additional notes:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;4.&amp;nbsp;You could use option &lt;FONT face="courier new,courier"&gt;encoding=&lt;/FONT&gt;&amp;nbsp;to pre-process the file properly, but not when using option&amp;nbsp;&lt;FONT face="courier new,courier"&gt;recfm=n &lt;/FONT&gt;because N means that only one byte is read at a a time.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;5. To conserve the data as is when reading the fixed file, you can use&lt;/P&gt;
&lt;PRE class="xisDoc-codeFragment"&gt;&lt;CODE&gt;filename extfile '&lt;EM class="xisDoc-userSuppliedValue"&gt;external-file&lt;/EM&gt;' encoding="utf-8"; &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;to read the data correctly. You should probably use the same option for your data set, as your SAS session is probably &lt;EM&gt;wlatin1&lt;/EM&gt;. Check this. You still won't be able to display the data properly though, since &lt;EM&gt;wlatin1&lt;/EM&gt; does not allow multi-byte characters.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;6. For what it's worth, UTF-8 is here to stay, so my opinion is: your organisation should start using UTF-8 environments and move away from ASCII-derived ones.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;7. So now you have 3 steps: pre-clean, proc import, post-clean instead of one data step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jun 2020 22:26:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/659152#M197520</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-06-15T22:26:45Z</dc:date>
    </item>
    <item>
      <title>Re: reading a csv file with embedded 'â€™'</title>
      <link>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/659177#M197535</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;&amp;nbsp; ,&lt;/P&gt;
&lt;P&gt;Thanks much for the advice .&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am doing what you have mentioned&amp;nbsp; as Point 7 now :&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;"So now you have 3 steps: pre-clean, proc import, post-clean instead of one data step."&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;only now there is a slight formatting issue . Can you please give any suggestion on this . I have pasted my code and output&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;%let repA=' '; /* replacement character LF */
%let repD=' '; /* replacement character CR */
%let dsnnme="_test_1.csv"; /* use full path of CSV file */

data _null_;
      /* RECFM=N reads the file in binary format. The file consists */
      /* of a stream of bytes with no record boundaries. SHAREBUFFERS */
      /* specifies that the FILE statement and the INFILE statement */
      /* share the same buffer. */
      infile &amp;amp;dsnnme recfm=n sharebuffers;
      file &amp;amp;dsnnme recfm=n;

      /* OPEN is a flag variable used to determine if the CR/LF is within */
      /* double quotes or not. Retain this value. */
      retain open 0;
      input a $char1.;

      /* If the character is a double quote, set OPEN to its opposite value. */
      if a = '"' then
           open = ^(open);

      /* If the CR or LF is after an open double quote, replace the byte with */
      /* the appropriate value. */
      if open then
           do;
                 if a = '0D'x then
                      put &amp;amp;repD;
                 else if a = '0A'x then
                      put &amp;amp;repA;
		  end;
run;


/*STEP 2 ; */

proc import
      /* CSV */
      datafile="_test_1.csv" 
        out=test  dbms=csv replace;
      delimiter=',';
      guessingrows=32767;

run;

&lt;FONT color="#FF0000"&gt;/* new steps */&lt;/FONT&gt;
data a;
set test  ;
keep surveyid REASON_FOR_SCORE varname ;
varname = translate(REASON_FOR_SCORE,"'","â€™");

run;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;Below is the output I get&amp;nbsp; .. There is a space after the " ' " . Is there anything to not have that extra space&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="dennis_oz_0-1592265795776.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/43243i16E835377A699BBB/image-size/medium?v=v2&amp;amp;px=400" role="button" title="dennis_oz_0-1592265795776.png" alt="dennis_oz_0-1592265795776.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jun 2020 00:04:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/659177#M197535</guid>
      <dc:creator>dennis_oz</dc:creator>
      <dc:date>2020-06-16T00:04:17Z</dc:date>
    </item>
    <item>
      <title>Re: reading a csv file with embedded 'â€™'</title>
      <link>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/659355#M197542</link>
      <description>&lt;P&gt;Try&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;VARNAME = prxchange("s/â€™/'/",-1,REASON_FOR_SCORE);&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Also, use capitals for a purpose. They increase legibility if used properly.&lt;/P&gt;
&lt;P&gt;I use them for user-defined names.&lt;/P&gt;
&lt;P&gt;This is easier to read&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data A;
  set TEST  ;
  keep SURVEYID REASON_FOR_SCORE VARNAME ;
  VARNAME = translate(REASON_FOR_SCORE,"'","â€™");
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;than this&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data a;
set test  ;
keep surveyid REASON_FOR_SCORE varname ;
varname = translate(REASON_FOR_SCORE,"'","â€™");

run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jun 2020 02:05:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/659355#M197542</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-06-16T02:05:44Z</dc:date>
    </item>
    <item>
      <title>Re: reading a csv file with embedded 'â€™'</title>
      <link>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/659390#M197544</link>
      <description>&lt;P&gt;Or if you want to avoid regular expressions, something like this&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt; VARNAME = compress(translate(REASON_FOR_SCORE, "'~~", "â€™"), '~');&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jun 2020 01:56:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/659390#M197544</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-06-16T01:56:30Z</dc:date>
    </item>
    <item>
      <title>Re: reading a csv file with embedded 'â€™'</title>
      <link>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/660001#M197604</link>
      <description>&lt;P&gt;Thanks Chris&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;VARNAME = compress(translate(REASON_FOR_SCORE, "&lt;FONT color="#FF0000"&gt;'~~&lt;/FONT&gt;", "â€™"),&lt;FONT color="#FF0000"&gt; '~'&lt;/FONT&gt;);&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;didn't quite understand&amp;nbsp; the highlighted part. can anyone please explain.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jun 2020 11:30:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/660001#M197604</guid>
      <dc:creator>dennis_oz</dc:creator>
      <dc:date>2020-06-16T11:30:20Z</dc:date>
    </item>
    <item>
      <title>Re: reading a csv file with embedded 'â€™'</title>
      <link>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/660010#M197607</link>
      <description>&lt;P&gt;You need to replace 3 characters with 3 characters, then you remove the 2 extraneous ones.&lt;/P&gt;
&lt;P&gt;Regular expressions (chosen solution)&amp;nbsp; are more straightforward if you don't mind the more complex syntax.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jun 2020 11:42:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/reading-a-csv-file-with-embedded-%C3%A2/m-p/660010#M197607</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-06-16T11:42:29Z</dc:date>
    </item>
  </channel>
</rss>

