Hello Everyone,
The title might not be accurate since I am not familiar with encoding, but here is my problem in simple words: I have a variable which is actually a list of names of people. Apparently, some of these names are Spanish or French, so they have characters which I belive are called "hexadecimal characters", such as E with an accent above it, or a lowercase i with umlaut above it. ( I dont know how to type them, some examples are attached in the picture.)
I want to convert all of them into regular characters, for example, E with dots into E, etc.
I thought compress function should be the right way, so first I tried to just keep the alphabets like this:
data test2; set test; names_translate = compress(name2,'','ka'); run;
It does not work unfortunately, and those charachters remain there. I played with other modifiers, such as 'c' or 'w' but those do not seem to give me what I want either. I was wondering if there is a neat method with compress function, or any other function that gives me the desired result? In the picture below I have shown basically what I have and what I want to get as output.
The function you are going to want is TRANSLATE. The characters are more likely to be "high order ASCII" or similar which are representations of ASCII values greater than 126.
The data set may help:
data work.highorderascii; do i= 127 to 255; char = byte(i); output; end; run;
Here is an example using translate function that may work for you.
data example; x='Andrè'; y=translate(x,'AAAAAAACEEEEIIIIDNOOOOO OUUUUY Saaaaaaaceeeeiiiidnooooo ouuuuy y', 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ'); run;
The value in the first long string replaces the corresponding value in the second string, which is why I show them one over the other above. The comparison is case sensitive and I have used what I believe to be the common replace for most of those going into English. If you need a different rule it should be easy to manipulate.
The function you are going to want is TRANSLATE. The characters are more likely to be "high order ASCII" or similar which are representations of ASCII values greater than 126.
The data set may help:
data work.highorderascii; do i= 127 to 255; char = byte(i); output; end; run;
Here is an example using translate function that may work for you.
data example; x='Andrè'; y=translate(x,'AAAAAAACEEEEIIIIDNOOOOO OUUUUY Saaaaaaaceeeeiiiidnooooo ouuuuy y', 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ'); run;
The value in the first long string replaces the corresponding value in the second string, which is why I show them one over the other above. The comparison is case sensitive and I have used what I believe to be the common replace for most of those going into English. If you need a different rule it should be easy to manipulate.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.