BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nmasel
Fluorite | Level 6

Hi,

 

My understanding reading KUPCASE documentation is this will only upcase single byte characters.  The code below running in SAS 9.4 is changing the double byte character "μ" to "M".  I need this to stay as "μ" but upcase all other characters.  Any ideas?


data atest;

x = "μ";
len = length(x);
klen = klength(x);
up = upcase(x);
kup = kupcase(x);

run;

 

Thank you in advance for any help you can offer.

 

Regards,

--Nick

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

There is nothing wrong in the behaviour of kupcase. µ is a normal lowcase letter in the greek alphabet, M is the upcase version. See https://en.wikipedia.org/wiki/Mu_(letter)

 

@Patrick shared this tip:

You could use ktranslate() as below if you only want to target English letters.

 

data atest;
x = "aμX";
len = length(x);
klen = klength(x);
up = upcase(x);
kup = kupcase(x);
ktrans=ktranslate(x,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz');
put _all_;
run;

 

View solution in original post

12 REPLIES 12
art297
Opal | Level 21

You could always use the grunt approach:

data atest;
  length x $4;
  input x $;
  len = length(x);
  klen = klength(x);
  up = upcase(x);
  kup = kupcase(x);
  do _n_=1 to length(x);
    if rank(substr(x,_n_,1)) in (97:122) then substr(x,_n_,1)=upcase(substr(x,_n_,1));
  end;
  cards;
μ
abc
aBc
deμ
;

Art, CEO, AnalystFinder.com

 

nmasel
Fluorite | Level 6

Thanks for the input.  This will work but it defeats the purpose of the kupcase function.  I guess I can dump this rank code to select lowercase a-z characters into a fcmp and use this as a function.

 

I'm still interested in why kupcase doesn't work so I'm going to leave this open for now and hope someone adds additional info.

Patrick
Opal | Level 21

@nmasel

You could use ktranslate() as below if you only want to target English letters.

data atest;
x = "aμX";
len = length(x);
klen = klength(x);
up = upcase(x);
kup = kupcase(x);
ktrans=ktranslate(x,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz');
put _all_;
run;
Tom
Super User Tom
Super User

I don't see any way to tell KUPCASE() what letters combinations make up lower/upper case pairs.

You could try using KTRANSLATE() to turn the mu into some other unused character and then translate it back.

data atest;
  length x $4;
  input x $;
  len = length(x);
  klen = klength(x);
  up = upcase(x);
  kup = kupcase(x);
  kup2 = ktranslate(kupcase(ktranslate(x,'|','μ')),'μ','|');
cards4;
μ
abc
aBc
deμ
;;;;
nmasel
Fluorite | Level 6
Thanks for the input. This would work too, but I would have to be certain of a character that I could sub in and out. I'm wondering what else KUPCASE does not handle correctly in a UTF-8 environment. If there are multiple than ART297's approach could limit this to all characters that can be upcased.
Tom
Super User Tom
Super User

You should probably open a ticket with SAS support to find out exactly how KUPCASE() is matching upper and lower case letters.

 

In terms of finding an available character to use for the KTRANSLATE() trick I normally use COMPRESS() function.  So I guess for this you could use KCOMPRESS()?

 

So here is logic that can work with single byte character sets.  So for example to convert the letter X to something that is not in STRING you could use this.

possible_chars=collate(0,255);
unused=char(compress(possible_chars,string),1);
new_string=translate(string,unused,'X');
ballardw
Super User

@nmasel wrote:

Hi,

 

My understanding reading KUPCASE documentation is this will only upcase single byte characters.  The code below running in SAS 9.4 is changing the double byte character "μ" to "M".  I need this to stay as "μ" but upcase all other characters.  Any ideas?


I believe you are misunderstaning the definition of the function: "Converts all single-width English alphabet letters in an argument to uppercase". And since enough Greek letters are used in certain English writing ...

 

That is NOT single-byte. The Kupcase functions at the I18N Level 2 for string manipulation which means that this function can be used for SBCS, DBCS, and MBCS (UTF-8) data.

 

 

andreas_lds
Jade | Level 19

There is nothing wrong in the behaviour of kupcase. µ is a normal lowcase letter in the greek alphabet, M is the upcase version. See https://en.wikipedia.org/wiki/Mu_(letter)

 

@Patrick shared this tip:

You could use ktranslate() as below if you only want to target English letters.

 

data atest;
x = "aμX";
len = length(x);
klen = klength(x);
up = upcase(x);
kup = kupcase(x);
ktrans=ktranslate(x,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz');
put _all_;
run;

 

nmasel
Fluorite | Level 6

Thanks, this makes much more sense!  The data is upcased prior to me recieving this so "µg" is turning into "MG" which are very different units.  All of the methods posted by others are great options to exclude µ from the kupcase.

art297
Opal | Level 21

@nmasel: You've marked this as solved, but the problem is different than stated in your original post and I only see an explanation, not a solution.

If I correctly understand it now, you have a file that has already been upcased and now includes upper case Greek characters .. which you don't want.

I'd still go with a grunt approach, but slightly different than the one I originally suggested:

data atest;
  length x change_back $4;
  input x $;
  up = upcase(x);
  call missing(change_back);
  do _n_=1 to klength(up);
    if length(ksubstr(up,_n_,1)) ne klength(ksubstr(up,_n_,1)) then
     change_back=catt(change_back,lowcase(ksubstr(up,_n_,1)));
    else change_back=catt(change_back,ksubstr(up,_n_,1));
  end;
  cards;
μ
abc
AbC
dEμ
;

Art, CEO, AnalystFinder.com

 

nmasel
Fluorite | Level 6

@art297:  I see what you are saying.  I marked as solved since the subject line is solved.  Once I realized uppercase µ should be M, I ended up with two new problems.

 

1. How to upcase everying but µ, which your first set of code with a tweek to only look at µ along with several other solutions from others posted here can address.

2. How to only change upcase µ back, which this code will handle nicely with a tweek to only change back for µ.

 

I'm relatively new to these boards so I'm not sure of the ettiqutte.  Should these two questions be posted in another string with an appropriate subject line so these can be found by others when searching?

 

Thank you for your time and effort on this topic!

art297
Opal | Level 21

My only concern was that you had a solution to your problem. Yes, when a problem scope changes or expands it's always best to start a new thread. However, in this case, all apparently turned out well and you now have all you need for getting what you want.

 

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1827 views
  • 3 likes
  • 6 in conversation