SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

DI Studio, problem with encoding when writing to CSV

Reply
Frequent Contributor
Posts: 89

DI Studio, problem with encoding when writing to CSV

In DI Studio I have a job which uses a File Writer transformation to write to an External File object, which is CSV. I need the CSV to be encoded in UTF-8.

I had hoped that I could simply set UTF-8 in the encoding option, but that has no apparent effect.

According to this SAS Support page (SAS(R) 9.2 National Language Support (NLS): Reference Guide), DI Studio uses Wlatin1 by default, and UTF-8 must be specified in the outfile statement.

However, in DI Studio the external object only has an INFILE statement, which I suppose is ignored.

Do you have any advice on how I can use DI Studio to write a CSV-file with UTF-8 encoding?

Frequent Contributor
Posts: 89

Re: DI Studio, problem with encoding when writing to CSV

Is there any way for me to write CSV with UTF-8 encoding? I don't mind if I have to use user written code here. No matter what I try, the external file shows up as ANSI.

The important thing is that Norwegian characters such as æ/ø/å should be displayed with 2 bytes.

Super User
Posts: 5,256

Re: DI Studio, problem with encoding when writing to CSV

This is annoying. The External File objects are clearly created for infiles. But they are the only objects that can be used for exporting files as well. Someone should tell SAS this obvious lack of functionality in such a basic object.

having that said, the only way I found is that you can specify this in the File Writer object (File Statement Options). Not perfect (encoding should be an attribute of the file so it will be reused, not the process step that creates it).

Data never sleeps
Valued Guide
Posts: 3,208

Re: DI Studio, problem with encoding when writing to CSV

SAS(R) Data Integration Studio 4.5: User's Guide (Specifying NLS Support for External Files)

You could overwrite the generated code as a bypass. Better to use SAS(R) Data Integration Studio 4.5: User's Guide (About User-Written Code). You can define your own transformation.

As your are Norwegian latin1 with wich code-page? ISO/IEC 8859-1 - Wikipedia, the free encyclopedia.

Your installation (server-side) is probably set-up is Norway-latin1. It could haven been setup using utf-8 extended with en/no. The WS-service is the one yuo are seeing with DI. With an installation supporting utf-8 en/no a WS service must be configured to choose one of them. You could however using multiple WS-servers each serving an other language/encoding.

Running a latin1 session will be normally the best choice as most people are expecting that behavior. An utf8 session has some quirks.    

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 3 replies
  • 787 views
  • 6 likes
  • 3 in conversation