BookmarkSubscribeRSS Feed
Sean_OConnor
Obsidian | Level 7

Hi,

 

I'm reading data out of an old database into my SAS session but the issue is the encoding between the two sessions is causing special characters to appear incorrect. Seán is not appearing as this but something else. 

 

Is there a way to convert the encoding for a variable of interest to display correctly? I'm trying to use the kcvt function but having issues. The encoding in my database is CP850 but this doesn't seem to be an option so wondering is there any workaround?

 

Any help is welcome. 

 

data f1(keep=Name CleanName);
set Namefile(obs=13);
CleanName=kcvt(Name ,'utf8','wtlatin1');
run;

 

 

 

4 REPLIES 4
Tom
Super User Tom
Super User

How the values DISPLAY is determined by the setting for the ENCODING system option.  That option is set when the SAS session starts.  For the most flexibility make sure your SAS session starts using UTF-8 encoding.  If you use any single byte encoding you are limited to only 256 possible characters.

 

How the values are transcoded when read from the "old database" depends on what that means. 

 

Is that a connection to some external database system?  What database system are you using? What type of connection? What driver are you using.

 

Did you just mean some old SAS dataset created on another computer? In that case run PROC CONTENTS on the dataset and check the ENCODING setting of the dataset. 

 

For a SAS dataset you can get away with using the ENCODING=ANY dataset option to allow you to get the data without any transcoding.  That way your attempts to use KCVT() might work.

 

What value do you see for that 4 character string your shared?  Try printing the value using the $HEX format to see what is actually stored.

 

If I paste those 4 characters from your posting into Notepad and save them as ANSI text of UTF-8 text I get these two different byte strings.  The ANSI string is 4 bytes long. But the UTF-8 string is 5 bytes long.

fname=sean_ansi.txt string=5365E16E Seán
fname=sean_utf8.txt string=5365C3A16E Seán

So the byte with hexcode of E1 is converted to the two byte sequence C3A1 when saved as UTF-8.

 

alexjordan
Fluorite | Level 6

You should try the use of "kcvtreu" or "kcvtx" function in SAS to handle encoding issues. As far as CP850 is concern, you may need to specify the closest available encoding e.g UTF-8 as a workaround. Then check the source and target encodings carefully in the SAS documentation as this will make sure the compatibility.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1259 views
  • 1 like
  • 4 in conversation