Learning SAS? Welcome to the exclusive online community for all SAS learners.

Non-English Characters in SAS University Edition

Reply
Super Contributor
Posts: 250

Non-English Characters in SAS University Edition

I was at SAS Global Forum this week and one of the great things about SASGF is the ability to connect with and talk to users from all around the world.  I got to meet a group of students from California who use SAS University Edition, and one of them asked me a question that I'd never thought about, but it made do some testing and as a result have found something interesting.

 

Here's a simple data step, where I have a non-English character in the name Carloš (I don't know if that's an actual name, this is just for example purposes):

 

data scores;
input Name $ Test_1 Test_2 Test_3;
datalines;
Bill 187 97 103
Carloš 156 76 74
Monique 99 102 129
;
proc sql;
select * from work.scores;
quit;

When you run this in SAS (whether it's base SAS, SAS Studio or SAS University Edition) everything is fine (note: Tested on Windows and Mac only).  

 

 

What I've found is if I am using a Mac and hold down the "s" key to get the non-English characters, I get this; note that I've added a second "š" in the name:

1.png

 

But when I remove the highlighting, I get this:

2.png

 

When I run the code, the second "š" appears in the table without the accent.  I need to do some additional testing (for example, I'm using Chrome for both the Windows and Mac, and need to try IE, Firefox, Safari etc. as it may be Chrome-specific).  In the meantime, if you need to use a non-English character and are using a Mac, copy and paste the character from a website or document as that seems to work. 

 

I also have a question as I'm only familiar with French (and a basic level at that) - Are there cases where the letter with an accent in a word and the letter without an accent in the same word changes the meaning?  Using my example above, would Carlš and Carls have different meanings (either tense, definition, etc.)?  I'm curious as this potential issue I've found could have a profound effect on someone doing text analytics, for example.  

 

Thanks for your time and please let me know if you have any thoughts or questions!

Chris

 

Has my article or post helped? Please mark as Solution or Like the article!
Frequent Contributor
Posts: 136

Re: Non-English Characters in SAS University Edition

[ Edited ]
Posted in reply to DarthPathos

Hi Darth.

 

Interesting - I ran your original code OK on my Lubuntu 15.10/Firefox/Oracle VM SAS U session, then ran again with a copy of the last character in the name in dataline 2 appended. Suspecting browser + browser OS problems, I did not use the browser to copy the character, I used the DBCS/SBCS certified cats and substr functions inside the data step. ( if _n_=2 then name = cats(name,substr(name,6)); ) I fully expected this to work properly but to be the control for further experiments. To my surprise, the sql step only prints the first observation, and the SAS U log complains of invalid characters. Hmmm. This error should not be a transcoding problem as everyting inside the VM box is utf-8.  I'll have to think about this one further.. Here is my log:

 
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56 data scores;
57 input Name $ Test_1 Test_2 Test_3;
58 if _n_ =2 then name=cats(name,substr(name,6));
59 datalines;
 
NOTE: The data set WORK.SCORES has 3 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.10 seconds
cpu time 0.07 seconds
 
63 ;
 
64 proc sql;
65 select * from work.scores;
ERROR: Invalid characters were present in the data.
ERROR: An error occurred while processing text data.
ERROR: Invalid characters were present in the data.
ERROR: An error occurred while processing text data.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
66 quit;
ERROR: Invalid characters were present in the data.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.23 seconds
cpu time 0.21 seconds
 
67
68 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
80
Super Contributor
Posts: 250

Re: Non-English Characters in SAS University Edition

Posted in reply to Damien_Mather

That is fascinating, and I'm intrigued that you're using a different method and getting getting an actual error in the log.  I don't get anything in my log to indicate a problem; please keep me posted on anything else you find.

 

Thanks so much for your time with this!

Have a great weekend

Chris

Has my article or post helped? Please mark as Solution or Like the article!
Frequent Contributor
Posts: 136

Re: Non-English Characters in SAS University Edition

Posted in reply to DarthPathos

Hi Chris.

 

This version works OK:

 

data scores;
length name $ 9;
input Name $  Test_1 Test_2 Test_3;
*if _n_ =2 then name=kstrcat(kstrip(name),kstrip(ksubstr(name,6)));
if _n_ =2 then name=cat(strip(name),strip(substr(name,6)));
nmlnbyts=lengthm(name);
datalines;
Bill 187 97 103
Carloš 156 76 74
Monique 99 102 129
;
run;
proc sql;
select * from work.scores;
quit;

 

The problem with my initial control variant attempt was that insufficient buffer memory (7 bytes, the byte length of 'monique', is automatically allocated to the name variable in your original form.

 

By extending the buffer to 9 bytes via the additional length statement in my version above solves the problem.

 

Each english character needs just 1 byte, each slavic š pronounced as 'zhs', character needs two bytes.

 

If manipulating the dataline/card characters via the studio browser (in all its browser/OS variants) does not introduce any further problems then simply ensuring the buffer is big enough to contain any changes should avoid any problems. I thought I might have needed the MBCS certified K-functions to get the manipulation right (the comment statement in my version) but the usual ones work OK.

 

My version output:

 

name Test_1 Test_2 Test_3 nmlnbyts
Bill 187 97 103 9
Carlošš 156 76 74 9
Monique 99 102 129 9

 

Let me know if this helps you resolve your query.

 

Laku loc (bon nuit) Chris..

 

Damien

 

Ask a Question
Discussion stats
  • 3 replies
  • 1170 views
  • 1 like
  • 2 in conversation