BookmarkSubscribeRSS Feed
Obsidian | Level 7

When loading different sheets from an Excel-file (SAS 9.3.1M2; LIN X64) I stumbled upon a problem joining some of the tables based on a key. After some research it turned out that the key in table A had a hidden character in it after the import, but the key in table B did not. Strangely enough, when running the same program on a Windows installation (SAS 9.3.1M2; WIN X64_SRV12) this problem did not occur.  


The problem is obvious after using the attached Excel-file and running the following code.

proc import 
     datafile="&path./Country test.xlsx" 
     dbms=xlsx replace out=work.country_1

proc import 
     datafile="&path./Country test.xlsx" 
     dbms=xlsx replace out=work.country_2

proc sql;
	create table country as 
			c2.COUNTRY as COUNTRY_2, 
		from work.country_1 as c1
			full join work.country_2 as c2
				on c2.COUNTRY = c1.COUNTRY
		order by c1.COUNTRY

The dataset country will look as expected on the Windows installation:








On the Linux installation the keys do not match:











The problem appears to be the presence of the "country" consisting of three characters ("?US"). Because of this on Linux all other countries are padded with an invisible third character.

data work.country_1_check;
     set work.country_1; 
     length = length(COUNTRY);
     character = char(COUNTRY, 3);
     hex = put(character, $HEX32.); 
     binary = put(character, $BINARY32.);

The results of this data step on Linux are strange:Linux_check.PNG




What causes this behaviour on Linux? Is anybody else experiencing the same thing when importing this Excel-file? Could it be the encoding on the two platforms? I noticed that the Windows installation uses wlatin1 and Linux has utf-8.

Garnet | Level 18

I have no answer to the question "what causes this behaviour".


You can omit the control characters using compress function with 3rd argument = c

like :   country1 = compress(country1,,'c');


BTWI noticed that reading an xslx file downloaded from windows 32 to linux 64,

in order to get the right date I have to subtract 60 years and 1 day, while using code like:


    libname myxl xlsx  '....xlsx';

   data two;

     set  myxl.sheet_name;

           date = intnx('year', date, -60) - 1;



Obsidian | Level 7

Thanks for the tip of using compress with the 'c' modifier. I was not aware of this possibility.


Of course I would prefer to avoid this, as in reality I am reading 27 sheets from a single Excel-file. This is an input data model for a program that I am using. Worst case I can write a macro to got through all tables and compress all character variables.


I just think it is strange that the result of the same code is different on different installations of SAS.  

Super User

I suspect that your last line: "I noticed that the Windows installation uses wlatin1 and Linux has utf-8."

has a bit as there is likely to be a layer of text conversion in one instance not in the other and I'm not going to open an unknown source Excel file to check.


Generally unless there is an overwhelming reason to try to read native Excel I save files to CSV so I control how they are read AND can see stupid things in the data easier.


And since Proc Import is always guessing about things then I wouldn't expect everything from two sheets in the same workbook to 'match'. Different lengths of character variables and different types depending on the values of the first rows are common issues.



Obsidian | Level 7

I agree that reading in an Excel-file can lead to surprises, but I think that the guessing of proc import is not the problem here. In both cases the country column is correctly recognized as character. It is just treated differently and that is strange (bug?) in my opinion. Why would it leave control characters in one case, but not in the other?



Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg



Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 3 in conversation