<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: proc import XLSX file: different results on Linux vs Windows in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/proc-import-XLSX-file-different-results-on-Linux-vs-Windows/m-p/388644#M65950</link>
    <description>&lt;P&gt;I suspect that your last line: "I noticed that the Windows installation uses wlatin1 and Linux has utf-8."&lt;/P&gt;
&lt;P&gt;has a bit as there is likely to be a layer of text conversion in one instance not in the other and I'm not going to open an unknown source Excel file to check.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Generally unless there is an overwhelming reason to try to read native Excel I save files to CSV so I control how they are read AND can see stupid things in the data easier.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And since Proc Import is always guessing about things then I wouldn't expect everything from two sheets in the same workbook to 'match'. Different lengths of character variables and different types depending on the values of the first rows are common issues.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 16 Aug 2017 22:16:13 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2017-08-16T22:16:13Z</dc:date>
    <item>
      <title>proc import XLSX file: different results on Linux vs Windows</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-import-XLSX-file-different-results-on-Linux-vs-Windows/m-p/388613#M65937</link>
      <description>&lt;P&gt;When loading different sheets from an Excel-file (SAS 9.3.1M2; LIN X64) I stumbled upon a problem joining some of the tables based on a key. After some research it turned out that the key in table A had a hidden character in it after the import, but the key in table B did not.&amp;nbsp;Strangely enough, when&amp;nbsp;running the same program on a Windows installation (SAS 9.3.1M2; WIN X64_SRV12) this problem&amp;nbsp;did not occur.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem is obvious after using the attached Excel-file and running the following code.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc import 
     datafile="&amp;amp;path./Country test.xlsx" 
     dbms=xlsx replace out=work.country_1
; 
     sheet='Country_1'; 
run; 

proc import 
     datafile="&amp;amp;path./Country test.xlsx" 
     dbms=xlsx replace out=work.country_2
; 
     sheet='Country_2'; 
run; 

proc sql;
	create table country as 
		select 
			c1.COUNTRY as COUNTRY_1,
			c2.COUNTRY as COUNTRY_2, 
			c1.POPULATION, 
			c2.AREA_KM2
		from work.country_1 as c1
			full join work.country_2 as c2
				on c2.COUNTRY = c1.COUNTRY
		order by c1.COUNTRY
	;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The dataset country will look as expected on the Windows installation:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="Windows_result.PNG" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/14523i9D7DDDB5CCA73DC9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Windows_result.PNG" alt="Windows_result.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;On the Linux installation the keys do not match:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="Linux_result.PNG" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/14524i293041ECAD2C2903/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Linux_result.PNG" alt="Linux_result.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem appears to be the presence of the "country" consisting of three characters ("?US"). Because of this on Linux all other countries are padded with an invisible third character.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data work.country_1_check;
     set work.country_1; 
 
     length = length(COUNTRY);
     character = char(COUNTRY, 3);
     hex = put(character, $HEX32.); 
     binary = put(character, $BINARY32.);
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The results of this data step on Linux are strange:&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="Linux_check.PNG" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/14525iEF18F0F116C4346C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Linux_check.PNG" alt="Linux_check.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What causes this behaviour on Linux? Is anybody else experiencing the same thing when importing this Excel-file? Could it be the encoding on the two platforms? I noticed that the Windows installation uses wlatin1 and Linux has utf-8.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 20:53:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-import-XLSX-file-different-results-on-Linux-vs-Windows/m-p/388613#M65937</guid>
      <dc:creator>NicoM</dc:creator>
      <dc:date>2017-08-16T20:53:41Z</dc:date>
    </item>
    <item>
      <title>Re: proc import XLSX file: different results on Linux vs Windows</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-import-XLSX-file-different-results-on-Linux-vs-Windows/m-p/388624#M65945</link>
      <description>&lt;P&gt;I have no answer to the question "what causes this behaviour".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can omit the control characters using compress function with 3rd argument = c&lt;/P&gt;
&lt;P&gt;like : &amp;nbsp; country1 =&lt;STRONG&gt; compress(country1,,'c');&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;BTW&lt;STRONG&gt;,&amp;nbsp;&lt;/STRONG&gt;I noticed that reading an xslx file downloaded from windows 32 to linux 64,&lt;/P&gt;
&lt;P&gt;in order to get the right &lt;STRONG&gt;date&lt;/STRONG&gt; I have to subtract 60 years and 1 day, while using code like:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; libname myxl xlsx &amp;nbsp;'....xlsx';&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;data two;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;set &amp;nbsp;myxl.sheet_name;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;date = intnx('year', date, -60) - 1;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 21:13:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-import-XLSX-file-different-results-on-Linux-vs-Windows/m-p/388624#M65945</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2017-08-16T21:13:08Z</dc:date>
    </item>
    <item>
      <title>Re: proc import XLSX file: different results on Linux vs Windows</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-import-XLSX-file-different-results-on-Linux-vs-Windows/m-p/388644#M65950</link>
      <description>&lt;P&gt;I suspect that your last line: "I noticed that the Windows installation uses wlatin1 and Linux has utf-8."&lt;/P&gt;
&lt;P&gt;has a bit as there is likely to be a layer of text conversion in one instance not in the other and I'm not going to open an unknown source Excel file to check.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Generally unless there is an overwhelming reason to try to read native Excel I save files to CSV so I control how they are read AND can see stupid things in the data easier.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And since Proc Import is always guessing about things then I wouldn't expect everything from two sheets in the same workbook to 'match'. Different lengths of character variables and different types depending on the values of the first rows are common issues.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Aug 2017 22:16:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-import-XLSX-file-different-results-on-Linux-vs-Windows/m-p/388644#M65950</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-08-16T22:16:13Z</dc:date>
    </item>
    <item>
      <title>Re: proc import XLSX file: different results on Linux vs Windows</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-import-XLSX-file-different-results-on-Linux-vs-Windows/m-p/389056#M65999</link>
      <description>&lt;P&gt;Thanks for the tip of using compress with the 'c' modifier. I was not aware of this possibility.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Of course I would prefer to avoid this, as in reality I am reading 27 sheets from a single Excel-file. This is an input data model for a program that I am using. Worst case&amp;nbsp;I can write a macro to got through all tables and compress all character variables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I just think it is strange that the result of the same code is different on different installations of SAS. &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Aug 2017 08:10:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-import-XLSX-file-different-results-on-Linux-vs-Windows/m-p/389056#M65999</guid>
      <dc:creator>NicoM</dc:creator>
      <dc:date>2017-08-18T08:10:16Z</dc:date>
    </item>
    <item>
      <title>Re: proc import XLSX file: different results on Linux vs Windows</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-import-XLSX-file-different-results-on-Linux-vs-Windows/m-p/389059#M66000</link>
      <description>&lt;P&gt;I agree that reading in an Excel-file can lead to surprises, but&amp;nbsp;I think that the guessing of proc import&amp;nbsp;is not the problem here.&amp;nbsp;In both cases the country column is correctly recognized as character. It is just treated differently and that is strange (bug?) in my opinion. Why would it leave control characters in one case, but not in the other?&lt;/P&gt;</description>
      <pubDate>Fri, 18 Aug 2017 17:37:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-import-XLSX-file-different-results-on-Linux-vs-Windows/m-p/389059#M66000</guid>
      <dc:creator>NicoM</dc:creator>
      <dc:date>2017-08-18T17:37:09Z</dc:date>
    </item>
  </channel>
</rss>

