BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rtbuttram
Fluorite | Level 6

This is more of a curiosity question than a problem.

 

I have a table that gets created every night in our SAS Grid environment using a data step like the following:

 

data lib.tabl;
     length type $ 15;
     length descript $ 9;
     input type $ descript $;
     infile datalines delimiter=',';
     datalines;
<<imagine datalines here>>
;

This code has been running for over a year with no issues.

 

Recently our SAS environment switched from Latin1 to UTF-8 session encoding.  I’ve noticed the table created by the above code still shows “latin1 Western (ISO)” as the Encoding scheme in PROC CONTENTS.  I would have expected the encoding to change to UTF-8 once our environment session encoding was changed.

 

I’ve tried to reproduce this behavior by intentionally creating a table with Latin1 encoding and then replacing its contents with a data step such as the above, but the result is always a table with UTF-8 encoding.

 

Does anyone have any idea why this older table remains in Latin1 after the change to our session encoding?  Again, more a curiosity question than a problem, as I have no need to store mutli-byte characters is this table.

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
rtbuttram
Fluorite | Level 6

Ok.  I *think* I've got a handle on this.

 

Earlier I said I had tried to reproduce the behavior by intentionally creating a table in latin1 and then trying to recreate it to see if it remained latin1.  That wasn't exactly what I did.  I actually uploaded a latin1 dataset that had been created on a Windows host, and then attempted to recreate it in a Linux environment.  That reliably results in a new table being created with utf-8 encoding.

 

If I create a table in my Linux session with latin1 encoding by using the encoding= data step option, and then replace that table in a separate data step without specifying an encoding option, SAS recognizes the encoding of the pre-existing dataset and uses CEDA to transcode the data from utf-8 to latin1, recreating the table in its original latin1 encoding.  Interestingly, the structure of the new dataset can be entirely different from that of the pre-existing dataset.  The key seems to be the encoding attribute of pre-existing dataset, and the fact that the pre-existing dataset was created using a data representation that is CEDA compatible.

 

I found the following CEDA documentation that seems to address this phenomen:

 

SAS Help Center: SAS File Processing with CEDA 

 

Thanks to everyone for their input.

 

Bob

View solution in original post

9 REPLIES 9
ballardw
Super User

If you haven't rerun the code to rebuild the table it would maintain the encoding it had when created. It isn't clear whether you have actually rerun that data step to create the table again.

rtbuttram
Fluorite | Level 6

Thanks for the reply.  

 

The code runs every night as part of a scheduled batch job.  PROC CONTENTS shows me that the table was created within the past 24 hours, and yet it remains Latin1.  Could the system user that’s running the code be accessing SAS with a Latin1 session encoding?

 

I should probably just delete the table, in which case it would probably get created as UTF-8 on the next run, but it drives me nuts when the system is behaving in an unexpected manner and I can’t figure out why. 

ChrisNZ
Tourmaline | Level 20

> Could the system user that’s running the code be accessing SAS with a Latin1 session encoding?

That's the first thing I'd look at: What's the configuration used by this batch job?

Patrick
Opal | Level 21

From what you describe it looks like "something" is overwriting the session encoding. It could be the dataset encoding option or the libname outencoding option.

Ksharp
Super User
Yeah. I meet the same problem too . Try option:

data lib.tabl(encoding='utf8') ;
rtbuttram
Fluorite | Level 6

Ok.  I *think* I've got a handle on this.

 

Earlier I said I had tried to reproduce the behavior by intentionally creating a table in latin1 and then trying to recreate it to see if it remained latin1.  That wasn't exactly what I did.  I actually uploaded a latin1 dataset that had been created on a Windows host, and then attempted to recreate it in a Linux environment.  That reliably results in a new table being created with utf-8 encoding.

 

If I create a table in my Linux session with latin1 encoding by using the encoding= data step option, and then replace that table in a separate data step without specifying an encoding option, SAS recognizes the encoding of the pre-existing dataset and uses CEDA to transcode the data from utf-8 to latin1, recreating the table in its original latin1 encoding.  Interestingly, the structure of the new dataset can be entirely different from that of the pre-existing dataset.  The key seems to be the encoding attribute of pre-existing dataset, and the fact that the pre-existing dataset was created using a data representation that is CEDA compatible.

 

I found the following CEDA documentation that seems to address this phenomen:

 

SAS Help Center: SAS File Processing with CEDA 

 

Thanks to everyone for their input.

 

Bob

Kurt_Bremser
Super User

One of the side effects of the macro I present at SASGF21 is that it prevents such behavior. In our batch jobs, result tables are always removed physically (if they exist) before being written out.

rtbuttram
Fluorite | Level 6
Thanks Kurt. I’ll be sure to check out your session.

Regards.
Bob
Kurt_Bremser
Super User

It's just a 15-minute "quick tip" type session.

Since I do not have a grid available, it would be nice to know if it can be implemented there in a reasonable fashion (either by using FDELETE() or the external rm -f).

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1944 views
  • 2 likes
  • 6 in conversation