Hi, are there functions that give me the precise UTF-8 code of the characters, it seems using rank/byte doesn't work,
and I think these functions don't have a krank/kbyte equivalent.
Here's what happens when I display the multi-byte UTF-8 characters, I use rank and them reconvert to see if it matches the original
character, it doesn't:
data _null_; set error; do i=1 to klength(Filename); L=ksubstr(Filename,i,1); Cod=rank(L); Reconv=byte(cod); put L= Cod= Reconv=; end; run;
Result:
Notice the values SAS display for them is always the same garbage diamond character, and their rank is 195.
L=C Cod=67 Reconv=C
L=h Cod=104 Reconv=h
L=i Cod=105 Reconv=i
L=c Cod=99 Reconv=c
L=o Cod=111 Reconv=o
L= Cod=32 Reconv=
L=C Cod=67 Reconv=C
L=é Cod=195 Reconv=�
L=s Cod=115 Reconv=s
L=a Cod=97 Reconv=a
L=r Cod=114 Reconv=r
L= Cod=32 Reconv=
L=- Cod=45 Reconv=-
L= Cod=32 Reconv=
L=À Cod=195 Reconv=�
L= Cod=32 Reconv=
L=P Cod=80 Reconv=P
L=r Cod=114 Reconv=r
L=i Cod=105 Reconv=i
L=m Cod=109 Reconv=m
L=e Cod=101 Reconv=e
L=i Cod=105 Reconv=i
L=r Cod=114 Reconv=r
L=a Cod=97 Reconv=a
L= Cod=32 Reconv=
L=V Cod=86 Reconv=V
The RANK function, as you have discovered, only works for single byte characters. In fact, a lookup of the rank function in SAS help generates this from the "sas 9.4 functions and Call routine reference":
This function is assigned an I18N Level 0 status, and is designed for SBCS data. Do not use this function to process DBCS or MBCS data. For more information, see Internationalization Compatibility.
SBCS in the above means "single byte character set".
I am not aware of a multi-byte analog to the rank function in SAS. But I bet that someone on this forum might work up a multi-byte analog to RANK. If they do, I ask them to call it UTF8RANK, to save room in my brain.
Perfect, it makes total sense, thanks.
I have solved my problem already, though, it was a long and hard struggle, but I finally found a way to rename my files
and remove these bad characters.
As you mention rank() and byte() are defined as I18N Level 0 and I don't see any I18N Level 2 function which would do the same.
May be one of the NLS formats and informats can get you what you need.
Check function unicode().
want=unicode( ' \u2765 ' );
Can you let me know what encoding option in SAS is compatible with ANSI?
I'm creating a Vb script in SAS, but the problem is that when I copy from the generated SAS file into my VB.vbs file (which is ANSI), it renders the following garbage.
WScript.Echo "Adding track 0001: Gal Costa - Objeto Sim, Objeto N�o.mp3"
If FSO.FileExists("D:\MP3\Z_to_move\Gal Costa - Objeto Sim, Objeto N�o.mp3") Then
playlist.AddFile("D:\MP3\Z_to_move\Gal Costa - Objeto Sim, Objeto N�o.mp3")
If I use UTF-8, the above garbage doesn't appear, but still doesn't work. A direct copy and past into a CMD prompt works, but not in the VB.vbs file. When VBscript tries to execute the command in the file, it doesn't find the file, due to changes to the characters in the copy process. If I save the file using UTF8 with BOM, it's even worse.
If I was right. VBS is just a text file ?
Can you open that vbs file via NotePad++ and at bottom right you will see the encoding of this file.
And use(also make sure carriage character is crlf ) :
filename x '...../x.vbs' encoding='vbs-file-encoding' termstr=crlf ;
Yes, Vb.vbs is a text file, it's encoded in ANSI.
Filename x '...../x.vbs' encoding='vbs-file-encoding' termstr=crlf ;
It seems I found the answer here, I am creating the Vb.vbs file in SAS using ANSI as encoding, and the file I copy the output to is also ANSI.
It's now finally working.
I don't think I need to change the line feed, it's fine.
Basically, it's a code like the below, I'm using SAS to generate Vbscript commands. Since my remote session is UTF-8, I set the encoding to ANSI.
%macro Change_path(playlist_name,arqui,add_track_only=false); data _null_; file "/home/jrsousa2/Vbscripts/&playlist_name..vbs" encoding=ANSI; set &arqui end=fim; length cmd $1000.; /* Variables */ if _n_=1 then do; put "Dim iTunesApp"; put "Dim playlist"; put "Dim track"; put; /*Connect to iTunes app*/ put "Set iTunesApp = CreateObject(""iTunes.Application.1"")"; put "Set FSO = CreateObject(""Scripting.FileSystemObject"")"; put; /*'Create playlist*/ put "Set playlist = iTunesApp.LibrarySource.Playlists.ItemByName(""&playlist_name"")"; put "If playlist is Nothing Then"; put " iTunesApp.CreatePlaylist(""&playlist_name"")"; put " Set playlist = iTunesApp.LibrarySource.Playlists.ItemByName(""&playlist_name"")"; put "Else"; /*'DELETE*/ put " playlist.delete"; /*'recreate playlist*/ put " iTunesApp.CreatePlaylist(""&playlist_name"")"; put " Set playlist = iTunesApp.LibrarySource.Playlists.ItemByName(""&playlist_name"")"; put "End If"; put; /* HERE WE DON'T WANT MISTAKES */ put "On error resume next"; put "count = 0"; put "track_no = 0"; put "Miss = 0 "; put; /* SAS IF END */ end; /* TRACK NO. */ cmd="Wscript.Echo ""Overall track: "||put(_n_,z5.)||""""; put cmd; cmd="If FSO.FileExists("""||trim(location)||""") Then"; put cmd; put " " "track_no = track_no+1"; cmd="WScript.Echo ""Adding track "" & track_no & "": "||trim(File)||""""; put " " cmd; cmd="playlist.AddFile("""||trim(location)||""")"; put " " cmd; /* THE BELOW IS A SAS COMMAND */ if lowcase("&add_track_only")="false" and not missing(new_location) then do; cmd="Set track = playlist.Tracks.Item(track_no)"; put " " cmd; put " If track.Location<>"""" Then"; cmd="If FSO.FileExists("""||trim(new_location)||""") Then"; put " " cmd; put " " "Wscript.Echo ""File exists, not moving"""; put " Else"; put " Wscript.Echo ""Changing iTunes location"""; /* HERE I BREAK THE CODE INTO 2 LINES, DUE TO SIZE */ cmd="FSO.MoveFile """||trim(location)||""", _"; put " " cmd; cmd=""""||trim(New_location)||""""; put " " cmd; cmd="track.Location = """||trim(New_location)||"""";; put " " cmd; put " " "If (Err.Number=0 or true) Then count = count+1"; cmd="Wscript.Echo ""Moved "" & count & "" tracks"""; put " " cmd; put " End If"; put " End If"; end; put "Else"; put " Miss = Miss+1"; put " Wscript.Echo ""File not found!"""; put "End If"; put "Wscript.Echo"; put; put; if Fim then do; cmd="Wscript.Echo ""Finished: "" & Count & "" files moved"""; put cmd; cmd="Wscript.Echo ""Missing files: "" & miss "; put cmd; put "Wscript.StdOut.Write vbNewLine & ""Press ENTER to continue"""; put "Do While Not WScript.StdIn.AtEndOfLine"; put " Input = WScript.StdIn.Read(1)"; put "Loop"; end; /* FIM */ run; %mend;
What encoding do you think ANSI means?
If your data is using UTF-8 then you should be using the same encoding for the VB script file.
Otherwise SAS will have to try to trascode the values before writing them to the file.
Yes. You are right. But there is not such RANK() for unicode.
But you can find unicode number in WORD when you insert a character (at the bottom of character window).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.