BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TBanky
Calcite | Level 5

Hi

Is there a list of all possible encodings that SAS support? I'm asking because I need to assign a specific one when using proc export, namely utf-8. Unfortunately, there are two types of utf-8 - normal and without BOM - and when I try to export a SAS dataset to txt (and use: encoding="utf-8") I receive a text file with UTF-8 w/o BOM, I'd like the other one. Is it possible to differentiate between these two encodings?

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I'd say that, since UTF-8 by definition is not dependent on byte order, SAS does the prudent thing and omits any BOM.

Since PROC EXPORT can't be made to supply the BOM (AFAIK), I would switch to a manually written data step and write the BOM in a if _n_ = 1 then do; put 'EFBBBF'x@; end; block before everything else.

View solution in original post

7 REPLIES 7
jakarman
Barite | Level 11

The byte order mark is not the utf8 encoding indicator although it is often used that way. XML files are using an encoding setting as they are based on plain text.

With utf8 files you are abandoning the byte is equal to char dogma. That is the important different way of thinking.

There are many other encodings Latin1 and otherwise based working with codepages and they are having that old dogma.

There is only one utf8 encoding as it is generic Unicode.

---->-- ja karman --<-----
TBanky
Calcite | Level 5

Ok, let me explain the problem then. I have a dataset and I need to export it to a txt file with utf-8 encoding. I thought everything was fine, but then there were problems with foreign language characters (not in the text file itself, but in the app it was imported to). So my next move was to check what happened and according to Notepad++ - the exported file was UTF-8 w/o BUM. I converted it using N++ to UTF-8 and there were no problems. So is there anything you could advise me? Since there's only one utf8 then there's probably no way to do in SAS what I did in N++?

Kurt_Bremser
Super User

I'd say that, since UTF-8 by definition is not dependent on byte order, SAS does the prudent thing and omits any BOM.

Since PROC EXPORT can't be made to supply the BOM (AFAIK), I would switch to a manually written data step and write the BOM in a if _n_ = 1 then do; put 'EFBBBF'x@; end; block before everything else.

TBanky
Calcite | Level 5

Thank you, that's it! But could you tell me how to show column/variable names in the first line? Data step omits them.

Kurt_Bremser
Super User

In a data step, one needs to supply the column headers for a .csv or similar file.

Just expand the if _n_ = 1 block:

if _n_ = 1 then do;

  put 'EFBBBF'x@; * BOM;

  put 'var1;var2;var3'; * Headers;

end;

Note the trailing @ at the end of the first put. It prevents a newline being written to the output. The next put then also ends the first line.

This also assumes that you write a file delimited by semicolons; adapt accordingly if you use another method of field separation.

Tom
Super User Tom
Super User

You can have the data step write the variable names also.

%let in=sashelp.class;

%let out=class.csv;

options nofmterr ;

*----------------------------------------------------------------------;

* Convert SAS dataset to CSV file ;

*----------------------------------------------------------------------;

data _null_;

  set &in;

  file "&out" dsd dlm=';' encoding=utf8 lrecl=1000000 ;

  if _n_ = 1 then put 'EFBBBF'x @;

  if _n_ = 1 then link names;

  put (_all_) (:);

  return;

names:

  length __name__ $32;

  do while(1);

    call vnext(__name__);

    if upcase(__name__) eq '__NAME__' then leave;

    put __name__ @;

  end;

  put;

  return;

run;

jakarman
Barite | Level 11

There is a system option controlling the writing if the bom or not this option is called bomfile.

I have seen the default setting is using that when your Sas session is running in utf8 mode. That is you have started that alternate binaries.

It can be very surprising using the Sas editor and experiencing the edited files get corrupted as the bom is added.

---->-- ja karman --<-----

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 7236 views
  • 0 likes
  • 4 in conversation