BookmarkSubscribeRSS Feed
jrsousa2
Obsidian | Level 7

Hi, are there functions that give me the precise UTF-8 code of the characters, it seems using rank/byte doesn't work,

and I think these functions don't have a krank/kbyte equivalent.

 

Here's what happens when I display the multi-byte UTF-8 characters, I use rank and them reconvert to see if it matches the original

character, it doesn't:

 

data _null_;
set error;
do i=1 to klength(Filename);
   L=ksubstr(Filename,i,1);
   Cod=rank(L);
   Reconv=byte(cod);
   put L= Cod= Reconv=;
end;
run;

 

Result:

Notice the values SAS display for them is always the same garbage diamond character, and their rank is 195.

 

L=C Cod=67 Reconv=C
L=h Cod=104 Reconv=h
L=i Cod=105 Reconv=i
L=c Cod=99 Reconv=c
L=o Cod=111 Reconv=o
L= Cod=32 Reconv=
L=C Cod=67 Reconv=C
L=é Cod=195 Reconv=�
L=s Cod=115 Reconv=s
L=a Cod=97 Reconv=a
L=r Cod=114 Reconv=r
L= Cod=32 Reconv=
L=- Cod=45 Reconv=-
L= Cod=32 Reconv=
L=À Cod=195 Reconv=�
L= Cod=32 Reconv=
L=P Cod=80 Reconv=P
L=r Cod=114 Reconv=r
L=i Cod=105 Reconv=i
L=m Cod=109 Reconv=m
L=e Cod=101 Reconv=e
L=i Cod=105 Reconv=i
L=r Cod=114 Reconv=r
L=a Cod=97 Reconv=a
L= Cod=32 Reconv=
L=V Cod=86 Reconv=V

12 REPLIES 12
mkeintz
PROC Star

The RANK function, as you have discovered, only works for single byte characters.  In fact, a lookup of the rank function in SAS help generates this from the "sas 9.4 functions and Call routine reference":

This function is assigned an I18N Level 0 status, and is designed for SBCS data. Do not use this function to process DBCS or MBCS data. For more information, see Internationalization Compatibility. 

SBCS in the above means "single byte character set".

 

I am not aware of a multi-byte analog to the rank function in SAS.  But I bet that someone on this forum might work up a multi-byte analog to RANK.   If they do, I ask them to call it   UTF8RANK, to save room in my brain.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jrsousa2
Obsidian | Level 7

Perfect, it makes total sense, thanks.

I have solved my problem already, though, it was a long and hard struggle, but I finally found a way to rename my files

and remove these bad characters.

Patrick
Opal | Level 21

As you mention rank() and byte() are defined as I18N Level 0 and I don't see any I18N Level 2 function which would do the same.

https://go.documentation.sas.com/?docsetId=nlsref&docsetTarget=p1pca7vwjjwucin178l8qddjn0gi.htm&docs...

 

May be one of the NLS formats and informats can get you what you need.

https://go.documentation.sas.com/?docsetId=nlsref&docsetTarget=p0d6kbq79a0d4pn18csf1jrm230d.htm&docs... 

 

 

Ksharp
Super User

Check function unicode().

want=unicode( ' \u2765 ' );

 

jrsousa2
Obsidian | Level 7
Thanks a lot.

Yes, my session is already UTF-8.
jrsousa2
Obsidian | Level 7

Can you let me know what encoding option in SAS is compatible with ANSI?

 

I'm creating a Vb script in SAS, but the problem is that when I copy from the generated SAS file into my VB.vbs file (which is ANSI), it renders the following garbage.

 

WScript.Echo "Adding track 0001: Gal Costa - Objeto Sim, Objeto N�o.mp3"
If FSO.FileExists("D:\MP3\Z_to_move\Gal Costa - Objeto Sim, Objeto N�o.mp3") Then
playlist.AddFile("D:\MP3\Z_to_move\Gal Costa - Objeto Sim, Objeto N�o.mp3")

 

If I use UTF-8, the above garbage doesn't appear, but still doesn't work. A direct copy and past into a CMD prompt works, but not in the VB.vbs file. When VBscript tries to execute the command in the file, it doesn't find the file, due to changes to the characters in the copy process. If I save the file using UTF8 with BOM, it's even worse.

 

Ksharp
Super User

If I was right. VBS is just a text file ?

Can you open that vbs file via NotePad++ and at bottom right you will see the encoding of this file.

 

And use(also make sure carriage character is crlf ) :

filename x '...../x.vbs'  encoding='vbs-file-encoding' termstr=crlf ;

jrsousa2
Obsidian | Level 7

Yes, Vb.vbs is a text file, it's encoded in ANSI.

 

Filename x '...../x.vbs'  encoding='vbs-file-encoding' termstr=crlf ;

 

It seems I found the answer here, I am creating the Vb.vbs file in SAS using ANSI as encoding, and the file I copy the output to is also ANSI.

It's now finally working.

I don't think I need to change the line feed, it's fine.

 

Basically, it's a code like the below, I'm using SAS to generate Vbscript commands. Since my remote session is UTF-8, I set the encoding to ANSI.

 

%macro Change_path(playlist_name,arqui,add_track_only=false);
data _null_;
file "/home/jrsousa2/Vbscripts/&playlist_name..vbs" encoding=ANSI;
set &arqui end=fim;
length cmd $1000.;

/* Variables */
if _n_=1 then do; 
   put "Dim iTunesApp";
   put "Dim playlist";
   put "Dim track";
   put;
/*Connect to iTunes app*/
   put "Set iTunesApp = CreateObject(""iTunes.Application.1"")";
   put "Set FSO = CreateObject(""Scripting.FileSystemObject"")";
   put;
/*'Create playlist*/
  put "Set playlist = iTunesApp.LibrarySource.Playlists.ItemByName(""&playlist_name"")";
  put "If playlist is Nothing Then";
  put "   iTunesApp.CreatePlaylist(""&playlist_name"")";
  put "   Set playlist = iTunesApp.LibrarySource.Playlists.ItemByName(""&playlist_name"")";
  put "Else";
/*'DELETE*/
  put "   playlist.delete";
/*'recreate playlist*/
  put "   iTunesApp.CreatePlaylist(""&playlist_name"")";
  put "   Set playlist = iTunesApp.LibrarySource.Playlists.ItemByName(""&playlist_name"")";
  put "End If";
  put;
/* HERE WE DON'T WANT MISTAKES */
  put "On error resume next";
  put "count = 0";
  put "track_no = 0";
  put "Miss = 0 ";
  put;
/* SAS IF END */
end;

/* TRACK NO. */
cmd="Wscript.Echo ""Overall track: "||put(_n_,z5.)||"""";
put cmd;
cmd="If FSO.FileExists("""||trim(location)||""") Then";
put cmd;
put "   " "track_no = track_no+1";
cmd="WScript.Echo ""Adding track "" & track_no & "": "||trim(File)||"""";
put "   " cmd;
cmd="playlist.AddFile("""||trim(location)||""")";
put "   " cmd;
/* THE BELOW IS A SAS COMMAND */
if lowcase("&add_track_only")="false" and not missing(new_location)
then do;
	cmd="Set track = playlist.Tracks.Item(track_no)";
	put "   " cmd;
	put "       If track.Location<>"""" Then";
	cmd="If FSO.FileExists("""||trim(new_location)||""") Then";
	put "          " cmd;
	put "             " "Wscript.Echo ""File exists, not moving""";
	put "          Else";
	put "             Wscript.Echo ""Changing iTunes location""";
	/* HERE I BREAK THE CODE INTO 2 LINES, DUE TO SIZE */
	cmd="FSO.MoveFile """||trim(location)||""", _";
	put "             " cmd;
	cmd=""""||trim(New_location)||"""";
	put "             " cmd;
	cmd="track.Location = """||trim(New_location)||"""";;
	put "             " cmd;
	put "             " "If (Err.Number=0 or true) Then count = count+1";
	cmd="Wscript.Echo ""Moved "" & count & "" tracks""";
	put "             " cmd;
	put "          End If";
	put "       End If";
end;
put "Else";
put "    Miss = Miss+1";
put "    Wscript.Echo ""File not found!""";
put "End If";
put "Wscript.Echo";
put;
put;

if Fim 
then do;
	   cmd="Wscript.Echo ""Finished: "" & Count & "" files moved""";
       put cmd;
	   cmd="Wscript.Echo ""Missing files: "" & miss ";
       put cmd;
       put "Wscript.StdOut.Write vbNewLine & ""Press ENTER to continue""";
       put "Do While Not WScript.StdIn.AtEndOfLine";
       put "   Input = WScript.StdIn.Read(1)";
       put "Loop";
     end;
/* FIM */
run;
%mend;
Tom
Super User Tom
Super User

What encoding do you think ANSI means?

If your data is using UTF-8 then you should be using the same encoding for the VB script file.

Otherwise SAS will have to try to trascode the values before writing them to the file.

jrsousa2
Obsidian | Level 7
ANSI is one of 5 possible page codes that Notepad accepts: ANSI, Utf8, Utf8 with bom, Utf-16 Le and Utf-16 Be.

ANSI is basically the same as western latin.
I found that I don't have to set my encoding to ANSI though, UTF-8 is fine.
jrsousa2
Obsidian | Level 7
Is there a rank function for unicode as well? I suppose that formula gives you a Unicode char given a Unicode number, right?
Ksharp
Super User

Yes. You are right. But there is not such RANK() for unicode.

But you can find unicode number in WORD when you insert a character (at the bottom of character window).

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1666 views
  • 5 likes
  • 5 in conversation