Hi
Is there a list of all possible encodings that SAS support? I'm asking because I need to assign a specific one when using proc export, namely utf-8. Unfortunately, there are two types of utf-8 - normal and without BOM - and when I try to export a SAS dataset to txt (and use: encoding="utf-8") I receive a text file with UTF-8 w/o BOM, I'd like the other one. Is it possible to differentiate between these two encodings?
I'd say that, since UTF-8 by definition is not dependent on byte order, SAS does the prudent thing and omits any BOM.
Since PROC EXPORT can't be made to supply the BOM (AFAIK), I would switch to a manually written data step and write the BOM in a if _n_ = 1 then do; put 'EFBBBF'x@; end; block before everything else.
The byte order mark is not the utf8 encoding indicator although it is often used that way. XML files are using an encoding setting as they are based on plain text.
With utf8 files you are abandoning the byte is equal to char dogma. That is the important different way of thinking.
There are many other encodings Latin1 and otherwise based working with codepages and they are having that old dogma.
There is only one utf8 encoding as it is generic Unicode.
Ok, let me explain the problem then. I have a dataset and I need to export it to a txt file with utf-8 encoding. I thought everything was fine, but then there were problems with foreign language characters (not in the text file itself, but in the app it was imported to). So my next move was to check what happened and according to Notepad++ - the exported file was UTF-8 w/o BUM. I converted it using N++ to UTF-8 and there were no problems. So is there anything you could advise me? Since there's only one utf8 then there's probably no way to do in SAS what I did in N++?
I'd say that, since UTF-8 by definition is not dependent on byte order, SAS does the prudent thing and omits any BOM.
Since PROC EXPORT can't be made to supply the BOM (AFAIK), I would switch to a manually written data step and write the BOM in a if _n_ = 1 then do; put 'EFBBBF'x@; end; block before everything else.
Thank you, that's it! But could you tell me how to show column/variable names in the first line? Data step omits them.
In a data step, one needs to supply the column headers for a .csv or similar file.
Just expand the if _n_ = 1 block:
if _n_ = 1 then do;
put 'EFBBBF'x@; * BOM;
put 'var1;var2;var3'; * Headers;
end;
Note the trailing @ at the end of the first put. It prevents a newline being written to the output. The next put then also ends the first line.
This also assumes that you write a file delimited by semicolons; adapt accordingly if you use another method of field separation.
You can have the data step write the variable names also.
%let in=sashelp.class;
%let out=class.csv;
options nofmterr ;
*----------------------------------------------------------------------;
* Convert SAS dataset to CSV file ;
*----------------------------------------------------------------------;
data _null_;
set ∈
file "&out" dsd dlm=';' encoding=utf8 lrecl=1000000 ;
if _n_ = 1 then put 'EFBBBF'x @;
if _n_ = 1 then link names;
put (_all_) (:);
return;
names:
length __name__ $32;
do while(1);
call vnext(__name__);
if upcase(__name__) eq '__NAME__' then leave;
put __name__ @;
end;
put;
return;
run;
There is a system option controlling the writing if the bom or not this option is called bomfile.
I have seen the default setting is using that when your Sas session is running in utf8 mode. That is you have started that alternate binaries.
It can be very surprising using the Sas editor and experiencing the edited files get corrupted as the bom is added.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.