Truncation error when transcoding euc_cn dataset

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Truncation error when transcoding euc_cn dataset

In PC SAS 9.4, with full language/encoding support installed, I'm trying to convert dataset encoded as euc_cn to utf-8 using equivalent of the following code:

libname inlib cvp '\\vmware-host\Shared Folders\test-input' inencoding='euc-cn' CVPMULTIPLIER=2;

libname outlib '\\vmware-host\Shared Folders\\test-output' outencoding='utf-8';

data outlib.utf_data ;

set inlib.chinese_data (keep=ApplyBillID ProdType);

run ;

ERROR: Some character data was lost during transcoding in the dataset

INLIB.CHINESE_DATA. Either the data contains characters that are not

representable in the new encoding or truncation occurred during transcoding.

It doesn't seem like the character field would be so large as not to be able to fit, and I have tried bumping up the cvpbytes and cvpmultiplier.


Has anyone experienced this before and know how to resolve?


Accepted Solutions
Solution
‎07-21-2015 10:47 PM
New Contributor
Posts: 2

Re: Truncation error when transcoding euc_cn dataset

Thanks all, Jaap solved it!!

I started a new session with a UTF-8 encoding, i.e. through shortcut with following target:

"C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" -CONFIG "C:\Program Files\SASHome\SASFoundation\9.4\nls\u8\sasv9.cfg"

That config file includes the following setting:

-ENCODING UTF-8

I ran the code from my original post, it processes without error does not get the warning about different encoding, and then is also able to render the Chinese characters in the dataset viewer.

View solution in original post


All Replies
Respected Advisor
Posts: 4,131

Re: Truncation error when transcoding euc_cn dataset

Your code as such looks correct to me so not really sure what's missing. I came lately across a comparable issue and there the following was the resolution: SAS(R) 9.4 National Language Support (NLS): Reference Guide, Fourth Edition

Valued Guide
Posts: 3,208

Re: Truncation error when transcoding euc_cn dataset

Ok the Euc encoding is a more classic dbcs (2 byte) approach organized in planes. SAS is knowing that http://support.sas.com/resources/papers/Multilingual_Computing_with_SAS_94.pdf

However when your sas sessions is running a classic sas English single byte loadable you cannot use those those things.  Everything is going back to a single byte but with urf8 you are working in1-4 bytes for a character. Yu must have those multiple bytes supported by your loadables.

There is some u8 directory in the loadables when utf8 support has been installed. Use that program to start your sas session. It must be rather easy to find at windows desktops running base. Mainframes (ebcdic) are not supported with utf8. http://support.sas.com/resources/papers/AddingAdditionalSASWorkspaceServerstoSupportMultipleEncoding...

When you have SAS/access and DB-clients installed review also those. The encoding translations is also set there. Would be confusing when that one is setting all chars to latin1 causing the trouble there.

---->-- ja karman --<-----
Super User
Posts: 9,865

Re: Truncation error when transcoding euc_cn dataset

I have the same experiment with you . Once try to transport sas dataset between Local SAS ( WLATIN1) and SAS University Edition( UTF-8 ).

But I failed. The workaround way I used is exporting the dataset into a CSV file with UTF-8 , then at UE importing this CSV file again. I know it is awkward.

Hope somebody can share a good idea.

Xia Keshan

Valued Guide
Posts: 3,208

Re: Truncation error when transcoding euc_cn dataset

Xia,  when you have SAS latin-1 and uf8 on the same computer there is no problem in sharing an converting data.
For some mad reason the importing o latin-1 data into utf8 is blocked by SAS. By having that unwanted unneeded limitation the conversion is made impossible. That situation is different to the one having both on the same system.
With SAS-connenct that same limitation of not able to connect to other encoding latin1 /utf8 did exist, this has changed: http://support.sas.com/documentation/cdl/en/connref/67933/HTML/default/viewer.htm#n1smv71b6303yen17k...

Encoding Compatibility between SAS/CONNECT Client and Server Sessions  

"Beginning with version 9.4, SAS/CONNECT supports connections between the client and the server in which one session is using UTF-8 and the other is using non-UTF-8. However, if one session's encoding is not compatible with the other session's encoding, then SAS will issue a WARNING stating that data might not have been transmitted correctly"

---->-- ja karman --<-----
Super User
Posts: 9,865

Re: Truncation error when transcoding euc_cn dataset

Jaap,

Sorry. Local SAS I mean is PC (standalone) SAS , it has nothing to do with SAS/CONNECT .

I can make a UTF-8 sas dataset by set -encoding utf-8 at sasconf file ,when I start a new sas session .

But That also can't make table from wlatin into utf-8 .  As you said , one session can only have a encoding , so whatever encoding in sas session you are using, you CAN NOT make a table with one encoding into another encoding, so I have to turn the sas table into a CSV File .  I don't know if I understand right , but it worked .

Message was edited by: xia keshan

Valued Guide
Posts: 3,208

Re: Truncation error when transcoding euc_cn dataset

Xia, starting an utf8 session on windows is not achieved by changing only the sas-config.  
Se the map below:  cd %SASROOT%; cd nls ; cd u8        There are many more there,  The trick is switching the config file in SASROOT or having an alternate script file that does the switch between those locations. I am running both encodings on my laptop this moment at the same time/moment.

Within the utf8 Windows session I am getting:

WARNING: Display of UTF8 encoded data is not fully supported by the SAS Display Manager System.

NOTE: This SAS session is using a registry in WORK. All changes will be lost at the end of


Copy-ing the sashelp.class to a share location (en latnin1) is resulting in a shared latin1 sas-dataset.

Copy-ing the sashelp.prdsale in the utf8 session is requiring the noclone option and result in a utf8 sas encoding dataset.  (details)
Nice the sashelp dataset are still latin1 in the ut8 sas session. There are no notes or warning opening either of those two datasets.

Opening the utf8 prdsale dataset (shared test location) in the latin1 sas session is giving in the log.
NOTE: Data file TEST.PRDSALE.DATA is in a format that is native to another host, or the file

encoding does not match the session encoding. Cross Environment Data Access will be used,

which might require additional CPU resources and might reduce performance.

---->-- ja karman --<-----
Respected Advisor
Posts: 4,131

Re: Truncation error when transcoding euc_cn dataset

Adam,

Would it be possible that you attach a small sample of your data where you get the error. That would help us to replicate and eventually resolve what you're observing.

I just can't believe that there isn't a direct way to get what you need.

Thanks,

Patrick

Solution
‎07-21-2015 10:47 PM
New Contributor
Posts: 2

Re: Truncation error when transcoding euc_cn dataset

Thanks all, Jaap solved it!!

I started a new session with a UTF-8 encoding, i.e. through shortcut with following target:

"C:\Program Files\SASHome\SASFoundation\9.4\sas.exe" -CONFIG "C:\Program Files\SASHome\SASFoundation\9.4\nls\u8\sasv9.cfg"

That config file includes the following setting:

-ENCODING UTF-8

I ran the code from my original post, it processes without error does not get the warning about different encoding, and then is also able to render the Chinese characters in the dataset viewer.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 2129 views
  • 5 likes
  • 4 in conversation