Help using Base SAS procedures

Unicode Output

Accepted Solution Solved
Reply
Super Contributor
Posts: 358
Accepted Solution

Unicode Output

Hi All:

I have been asked to output a file (just text) in Unicode (UTF-8).

My code is:

data work02;

file "&outfile" encoding="utf-8" notitles lrecl=400;

set work01;

text1 = compress(text,'"');

put @001 text1 $utf8x400.; 

run;

and it seems to run OK.

When I look at the resulting file, there only seems to be a couple of funny characters at the biginning of the first record and nothing else different.  How can I confirm that the results are in Unicode format?

Thanks in advance.


Accepted Solutions
Solution
‎11-28-2011 02:28 PM
Super Contributor
Posts: 358

Re: Unicode Output

Sorry - but I have to answer my own question here.... (after a call to SAS).

Just adding the format modifier ":" (full colon) before the unicode output format worked.

So the format becomes  ":$utf8x400."

View solution in original post


All Replies
PROC Star
Posts: 7,356

Unicode Output

There are some examples at: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000211297.htm

One easy test would simply be to reverse the process, i.e., to import the resulting file using the same encoding (i.e., utf-8).

Of course, if the file is really comprised of multiple fields, I would think that you would have to have exported it as multiple fields in order for the end user to be able to accurately treat it as being in unicode format.  My limited experience (1 time) dealing with that format was with a file that contained nulls between every field.

Regular Contributor
Posts: 241

Unicode Output

All the 7-bit ascii characters are also valid utf-8 encoded unicode characters -- utf-8 encoding scheme was specifically designed to be this way. As long as your input characters are ascii (no extended characters), your output should be the same. hth.

Super Contributor
Posts: 358

Unicode Output

All:

Thanks for all the suggestions....

Now - another (minor) issue.

How do I remove the trailing blanks from the output record written in the Unicode format (as noted above).  The client wants the trailing blanks removed...

If I use just "$utf8x." as the output format, the default length is 8 chars.

Anyone ???

Super User
Super User
Posts: 6,498

Re: Unicode Output

I do not think that you need to use the $utf8x format. The ENCODING option on the FILE statement should be enough.  You can use the $VARYING format to write varying lengths to the output file.

data _null_;

  file "&outfile" encoding="utf-8" lrecl=400;

  set work01;

  text1 = compress(text,'"');

  len=length(text1);

  put text1 $varying400. len;

run;

Note the two bytes at the beginning of the file is the signal to SAS and other programs what encoding is being used.  Try different encodings and you should see different values in those first two bytes.

Solution
‎11-28-2011 02:28 PM
Super Contributor
Posts: 358

Re: Unicode Output

Sorry - but I have to answer my own question here.... (after a call to SAS).

Just adding the format modifier ":" (full colon) before the unicode output format worked.

So the format becomes  ":$utf8x400."

PROC Star
Posts: 7,356

Re: Unicode Output

Why sorry?  You got the answer to your question and, better yet, shared it with the rest of us.  Mark your last post as the one that had the correct answer.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 234 views
  • 3 likes
  • 4 in conversation