- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, are there functions that give me the precise UTF-8 code of the characters, it seems using rank/byte doesn't work,
and I think these functions don't have a krank/kbyte equivalent.
Here's what happens when I display the multi-byte UTF-8 characters, I use rank and them reconvert to see if it matches the original
character, it doesn't:
data _null_; set error; do i=1 to klength(Filename); L=ksubstr(Filename,i,1); Cod=rank(L); Reconv=byte(cod); put L= Cod= Reconv=; end; run;
Result:
Notice the values SAS display for them is always the same garbage diamond character, and their rank is 195.
L=C Cod=67 Reconv=C
L=h Cod=104 Reconv=h
L=i Cod=105 Reconv=i
L=c Cod=99 Reconv=c
L=o Cod=111 Reconv=o
L= Cod=32 Reconv=
L=C Cod=67 Reconv=C
L=é Cod=195 Reconv=�
L=s Cod=115 Reconv=s
L=a Cod=97 Reconv=a
L=r Cod=114 Reconv=r
L= Cod=32 Reconv=
L=- Cod=45 Reconv=-
L= Cod=32 Reconv=
L=À Cod=195 Reconv=�
L= Cod=32 Reconv=
L=P Cod=80 Reconv=P
L=r Cod=114 Reconv=r
L=i Cod=105 Reconv=i
L=m Cod=109 Reconv=m
L=e Cod=101 Reconv=e
L=i Cod=105 Reconv=i
L=r Cod=114 Reconv=r
L=a Cod=97 Reconv=a
L= Cod=32 Reconv=
L=V Cod=86 Reconv=V
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The RANK function, as you have discovered, only works for single byte characters. In fact, a lookup of the rank function in SAS help generates this from the "sas 9.4 functions and Call routine reference":
This function is assigned an I18N Level 0 status, and is designed for SBCS data. Do not use this function to process DBCS or MBCS data. For more information, see Internationalization Compatibility.
SBCS in the above means "single byte character set".
I am not aware of a multi-byte analog to the rank function in SAS. But I bet that someone on this forum might work up a multi-byte analog to RANK. If they do, I ask them to call it UTF8RANK, to save room in my brain.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Perfect, it makes total sense, thanks.
I have solved my problem already, though, it was a long and hard struggle, but I finally found a way to rename my files
and remove these bad characters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As you mention rank() and byte() are defined as I18N Level 0 and I don't see any I18N Level 2 function which would do the same.
May be one of the NLS formats and informats can get you what you need.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Check function unicode().
want=unicode( ' \u2765 ' );
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, my session is already UTF-8.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you let me know what encoding option in SAS is compatible with ANSI?
I'm creating a Vb script in SAS, but the problem is that when I copy from the generated SAS file into my VB.vbs file (which is ANSI), it renders the following garbage.
WScript.Echo "Adding track 0001: Gal Costa - Objeto Sim, Objeto N�o.mp3"
If FSO.FileExists("D:\MP3\Z_to_move\Gal Costa - Objeto Sim, Objeto N�o.mp3") Then
playlist.AddFile("D:\MP3\Z_to_move\Gal Costa - Objeto Sim, Objeto N�o.mp3")
If I use UTF-8, the above garbage doesn't appear, but still doesn't work. A direct copy and past into a CMD prompt works, but not in the VB.vbs file. When VBscript tries to execute the command in the file, it doesn't find the file, due to changes to the characters in the copy process. If I save the file using UTF8 with BOM, it's even worse.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If I was right. VBS is just a text file ?
Can you open that vbs file via NotePad++ and at bottom right you will see the encoding of this file.
And use(also make sure carriage character is crlf ) :
filename x '...../x.vbs' encoding='vbs-file-encoding' termstr=crlf ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, Vb.vbs is a text file, it's encoded in ANSI.
Filename x '...../x.vbs' encoding='vbs-file-encoding' termstr=crlf ;
It seems I found the answer here, I am creating the Vb.vbs file in SAS using ANSI as encoding, and the file I copy the output to is also ANSI.
It's now finally working.
I don't think I need to change the line feed, it's fine.
Basically, it's a code like the below, I'm using SAS to generate Vbscript commands. Since my remote session is UTF-8, I set the encoding to ANSI.
%macro Change_path(playlist_name,arqui,add_track_only=false); data _null_; file "/home/jrsousa2/Vbscripts/&playlist_name..vbs" encoding=ANSI; set &arqui end=fim; length cmd $1000.; /* Variables */ if _n_=1 then do; put "Dim iTunesApp"; put "Dim playlist"; put "Dim track"; put; /*Connect to iTunes app*/ put "Set iTunesApp = CreateObject(""iTunes.Application.1"")"; put "Set FSO = CreateObject(""Scripting.FileSystemObject"")"; put; /*'Create playlist*/ put "Set playlist = iTunesApp.LibrarySource.Playlists.ItemByName(""&playlist_name"")"; put "If playlist is Nothing Then"; put " iTunesApp.CreatePlaylist(""&playlist_name"")"; put " Set playlist = iTunesApp.LibrarySource.Playlists.ItemByName(""&playlist_name"")"; put "Else"; /*'DELETE*/ put " playlist.delete"; /*'recreate playlist*/ put " iTunesApp.CreatePlaylist(""&playlist_name"")"; put " Set playlist = iTunesApp.LibrarySource.Playlists.ItemByName(""&playlist_name"")"; put "End If"; put; /* HERE WE DON'T WANT MISTAKES */ put "On error resume next"; put "count = 0"; put "track_no = 0"; put "Miss = 0 "; put; /* SAS IF END */ end; /* TRACK NO. */ cmd="Wscript.Echo ""Overall track: "||put(_n_,z5.)||""""; put cmd; cmd="If FSO.FileExists("""||trim(location)||""") Then"; put cmd; put " " "track_no = track_no+1"; cmd="WScript.Echo ""Adding track "" & track_no & "": "||trim(File)||""""; put " " cmd; cmd="playlist.AddFile("""||trim(location)||""")"; put " " cmd; /* THE BELOW IS A SAS COMMAND */ if lowcase("&add_track_only")="false" and not missing(new_location) then do; cmd="Set track = playlist.Tracks.Item(track_no)"; put " " cmd; put " If track.Location<>"""" Then"; cmd="If FSO.FileExists("""||trim(new_location)||""") Then"; put " " cmd; put " " "Wscript.Echo ""File exists, not moving"""; put " Else"; put " Wscript.Echo ""Changing iTunes location"""; /* HERE I BREAK THE CODE INTO 2 LINES, DUE TO SIZE */ cmd="FSO.MoveFile """||trim(location)||""", _"; put " " cmd; cmd=""""||trim(New_location)||""""; put " " cmd; cmd="track.Location = """||trim(New_location)||"""";; put " " cmd; put " " "If (Err.Number=0 or true) Then count = count+1"; cmd="Wscript.Echo ""Moved "" & count & "" tracks"""; put " " cmd; put " End If"; put " End If"; end; put "Else"; put " Miss = Miss+1"; put " Wscript.Echo ""File not found!"""; put "End If"; put "Wscript.Echo"; put; put; if Fim then do; cmd="Wscript.Echo ""Finished: "" & Count & "" files moved"""; put cmd; cmd="Wscript.Echo ""Missing files: "" & miss "; put cmd; put "Wscript.StdOut.Write vbNewLine & ""Press ENTER to continue"""; put "Do While Not WScript.StdIn.AtEndOfLine"; put " Input = WScript.StdIn.Read(1)"; put "Loop"; end; /* FIM */ run; %mend;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What encoding do you think ANSI means?
If your data is using UTF-8 then you should be using the same encoding for the VB script file.
Otherwise SAS will have to try to trascode the values before writing them to the file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ANSI is basically the same as western latin.
I found that I don't have to set my encoding to ANSI though, UTF-8 is fine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes. You are right. But there is not such RANK() for unicode.
But you can find unicode number in WORD when you insert a character (at the bottom of character window).