Help using Base SAS procedures

Export encoding

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Export encoding

Hi

Is there a list of all possible encodings that SAS support? I'm asking because I need to assign a specific one when using proc export, namely utf-8. Unfortunately, there are two types of utf-8 - normal and without BOM - and when I try to export a SAS dataset to txt (and use: encoding="utf-8") I receive a text file with UTF-8 w/o BOM, I'd like the other one. Is it possible to differentiate between these two encodings?


Accepted Solutions
Solution
‎06-12-2015 05:10 AM
Super User
Posts: 6,938

Re: Export encoding

I'd say that, since UTF-8 by definition is not dependent on byte order, SAS does the prudent thing and omits any BOM.

Since PROC EXPORT can't be made to supply the BOM (AFAIK), I would switch to a manually written data step and write the BOM in a if _n_ = 1 then do; put 'EFBBBF'x@; end; block before everything else.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Valued Guide
Posts: 3,208

Re: Export encoding

The byte order mark is not the utf8 encoding indicator although it is often used that way. XML files are using an encoding setting as they are based on plain text.

With utf8 files you are abandoning the byte is equal to char dogma. That is the important different way of thinking.

There are many other encodings Latin1 and otherwise based working with codepages and they are having that old dogma.

There is only one utf8 encoding as it is generic Unicode.

---->-- ja karman --<-----
Occasional Contributor
Posts: 6

Re: Export encoding

Ok, let me explain the problem then. I have a dataset and I need to export it to a txt file with utf-8 encoding. I thought everything was fine, but then there were problems with foreign language characters (not in the text file itself, but in the app it was imported to). So my next move was to check what happened and according to Notepad++ - the exported file was UTF-8 w/o BUM. I converted it using N++ to UTF-8 and there were no problems. So is there anything you could advise me? Since there's only one utf8 then there's probably no way to do in SAS what I did in N++?

Solution
‎06-12-2015 05:10 AM
Super User
Posts: 6,938

Re: Export encoding

I'd say that, since UTF-8 by definition is not dependent on byte order, SAS does the prudent thing and omits any BOM.

Since PROC EXPORT can't be made to supply the BOM (AFAIK), I would switch to a manually written data step and write the BOM in a if _n_ = 1 then do; put 'EFBBBF'x@; end; block before everything else.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 6

Re: Export encoding

Thank you, that's it! But could you tell me how to show column/variable names in the first line? Data step omits them.

Super User
Posts: 6,938

Re: Export encoding

In a data step, one needs to supply the column headers for a .csv or similar file.

Just expand the if _n_ = 1 block:

if _n_ = 1 then do;

  put 'EFBBBF'x@; * BOM;

  put 'var1;var2;var3'; * Headers;

end;

Note the trailing @ at the end of the first put. It prevents a newline being written to the output. The next put then also ends the first line.

This also assumes that you write a file delimited by semicolons; adapt accordingly if you use another method of field separation.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 6,500

Re: Export encoding

You can have the data step write the variable names also.

%let in=sashelp.class;

%let out=class.csv;

options nofmterr ;

*----------------------------------------------------------------------;

* Convert SAS dataset to CSV file ;

*----------------------------------------------------------------------;

data _null_;

  set &in;

  file "&out" dsd dlm=';' encoding=utf8 lrecl=1000000 ;

  if _n_ = 1 then put 'EFBBBF'x @;

  if _n_ = 1 then link names;

  put (_all_) (Smiley Happy;

  return;

names:

  length __name__ $32;

  do while(1);

    call vnext(__name__);

    if upcase(__name__) eq '__NAME__' then leave;

    put __name__ @;

  end;

  put;

  return;

run;

Valued Guide
Posts: 3,208

Re: Export encoding

There is a system option controlling the writing if the bom or not this option is called bomfile.

I have seen the default setting is using that when your Sas session is running in utf8 mode. That is you have started that alternate binaries.

It can be very surprising using the Sas editor and experiencing the edited files get corrupted as the bom is added.

---->-- ja karman --<-----
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 2193 views
  • 0 likes
  • 4 in conversation