BookmarkSubscribeRSS Feed

UTF‑8 Transcoding for SAS 9 Libraries: A Practical, Space‑Efficient Approach Using CVPMULTIPLIER

Started ‎04-02-2026 by
Modified ‎04-02-2026 by
Views 365
Migrating SAS 9 datasets to SAS Viya often requires converting data from WLATIN1 (or another single‑byte encoding) to UTF‑8. This step can be confusing, and if not handled carefully, it may lead to truncated character data or unnecessarily large datasets.
 
This post explains why transcoding requires attention and introduces a SAS program that automatically determines the minimum CVPMULTIPLIER needed to safely convert SAS 9 datasets to UTF‑8 without wasting disk space.
 
Why Transcoding Matters

Most SAS 9 environments use WLATIN1, an 8‑bit encoding with 256 total characters.
SAS Viya uses UTF‑8, a variable‑length encoding that supports all Unicode characters.

 

How many bytes do characters use?

Examples

Data

WLATIN1 bytes

UTF-8 bytes

Notes

1234

4 bytes

ASCII

ABCDE

5 bytes

ASCII

Café_2026

9 bytes

10 bytes

é expands from 1 -> 2 bytes

abc©®

5 bytes

7 bytes

© and ® expand from 1 -> 2 bytes

 

Why this causes issues

SAS character variables have fixed lengths.
If a column is $10 in WLATIN1, SAS must ensure it can hold the expanded UTF‑8 version.

If the column contains extended ASCII characters and the length is not increased, SAS Viya may show:

  • truncated values
  • missing characters
  • transcoding errors

This is where CVP (Character Variable Padding) comes in.

Using CVP to Increase Column Widths

CVP allows you to multiply the original column width by a CVPMULTIPLIER.

Example:

  • Original column: $10
  • CVPMULTIPLIER = 1.5
    → New width = $15

A multiplier of 2.0 guarantees enough space even if every character is extended ASCII.
But doubling every column is often unnecessary and can significantly increase dataset size.

 

A Smarter Approach: Automatically Finding the Minimum CVPMULTIPLIER

To avoid blindly doubling dataset sizes, I created a macro that:

  1. Attempts to copy all datasets with no CVPMULTIPLIER
    • Works if the data contains only ASCII characters.
  2. Retries only the datasets that failed, using CVPMULTIPLIER = 1.5
  3. If needed, retries again with CVPMULTIPLIER = 2.0
  4. Continues increasing the multiplier (2.5 → 3.0 → …)
    until all datasets successfully transcode to UTF‑8

Why this approach is better

  • Avoids unnecessary dataset bloat
  • Ensures each dataset gets just enough expansion
  • Reduces disk usage and CAS memory footprint. This is especially useful for large SAS 9 libraries being prepared for Viya migration.

🛠How to Run the Process

Copy the full SAS program in the final section of this article to your SAS 9 environment and configure the source (WLATIN1) and target (UTF‑8) directories in the lines shown below:

 

/****************** USER CONFIGURATION  *******************/
%let sourceDir=%str(C:\temp2\sasdata);
%let targetDir=%str(C:\temp2\sasdatautf8);
/****************** END USER CONFIGURATION  *******************/
/*=======================================================================*/

Then run the SAS Program.
It will automatically:

  • detect which datasets need expansion
  • apply the appropriate CVPMULTIPLIER
  • produce UTF‑8 datasets in the target directory

If the purpose of the transcoding is to move to SAS Viya, copy the target directory over to SAS Viya.

 

Important: Disk Space Requirements

The process creates new UTF‑8 copies of your SAS 9 datasets.

This means you need at least as much free disk space as the size of your SAS 9 library, and in some cases more.

Example:

  • If your SAS 9 library is 50 GB,
    plan for 50–75 GB of free space during conversion.

This is temporary but necessary.

Questions or Clarifications?

If you have any follow up questions, please feel free to reply here. I’m happy to help.

 

Reference

SAS Documentation

 

SAS Program for Transcoding your datasets to UTF-8

/*
AUTHOR: VG
CREATED: 03/11/2026
PURPOSE: Transcode Datasets to UTF-8 from Source library to Target Library using the minimum CVP multiplier. 
USER CONFIG: 
Source Directory: Path to Source where the SAS 9 datasets in WLATIN1 (or other Non-UTF-8 encoding) are saved
Target Directory: Path to Target where you want the copies of the datasets from source directory in UTF-8 format 
User have to update these 2 paths before executing the program

USE CASES:
If target directory is empty, all the datasets from source directory will be copied to the target directory.
If target directory has some but not all datasets copied and the ones copied are all in utf-8, the remaining datasets from source dir will be copied to target.
If target directory has some but not all datasets copied and the ones copied are NOT all in utf-8, the remaining datasets from source dir will be copied to target, 
including the ones NOT in utf-8 in the target directory.

TESTING:
Tested with both small and large (10 million records and 11 columns) dataset with extended ASCII characters.

ERRORS in the LOG:
The Errors as the ones shown below can occur in the log because the process tries to use the minimum possible CVPMultiplier and increases it upon failure to a high value.
This iterative approach ensure that we not over allocate space for the dataset in SAS Viya.

This ERROR in the log can be ignored.

ERROR: Some character data was lost during transcoding in the dataset SDS_OUT.LARGETESTDATA. Either
       the data contains characters that are not representable in the new encoding or truncation
       occurred during transcoding.
ERROR: File SDS_OUT.LARGETESTDATA.DATA has not been saved because copy could not be completed.
*/

/*============================================================================*/
/****************** USER CONFIGURATION  *******************/
%let sourceDir=%str(C:\temp2\sasdata);
%let targetDir=%str(C:\temp2\sasdatautf8);
/****************** END USER CONFIGURATION  *******************/
/*============================================================================*/

/* libref assignments */
%let sourceLib=SDSCVP;
%let targetLib=SDS_OUT;
%let sourceLib=%upcase(&sourceLib);
%let targetLib=%upcase(&targetLib);
options mprint;
/*iteratively increase cvpmultiplier to 1, 1.5, 2, 2.5, 3, 3.5 and 4 and check if all the datasets are copied */
%macro print_tabs_in_tgt;
	%put NOTE: Final UTF-8 datasets in the target library:;
	title 'Final UTF-8 datasets in the target library';
	proc sql;
	    select memname, encoding
	    from sashelp.vtable
	    where upcase(libname)="&targetLib";
	quit;
	title '';
%mend print_tabs_in_tgt;
%macro copy_ds_to_utf8;
	libname &targetLib "&targetDir" outencoding='utf-8';
	%do j = 2 %to 8;              /* represents 1.0 to 4.0 in steps of 0.5 */
    	%let i = %sysevalf(&j/2);
		%put ==============================================;
		%put Note: New CVPMultiplier=&i.;
		%put ==============================================;
	    libname &sourceLib cvp "&sourceDir" cvpmultiplier=&i;

/*		proc copy in=&sourceLib out=&targetLib noclone;*/
/*		run;*/

		%let dsnotcopied=;
		proc sql noprint;
			select memname into :dsnotcopied separated by ' ' 
			from sashelp.vtable 
			where memtype='DATA' and libname="&sourceLib"
			and memname not in
			(select memname from sashelp.vtable 
			where memtype='DATA' and libname="&targetLib" and find(upcase(encoding), 'UTF') > 0);
		quit;

		%if &dsnotcopied= %then %do;
			%put ==============================================;
			%put Note: all datasets are copied in UTF-8. existing macro.;
			%put ==============================================;
			%print_tabs_in_tgt
			%RETURN;
		%end;
		%else %do;
			%put ==============================================;
			%put Note: Datasets remain to be copied in UTF-8. &=dsnotcopied. Increasing CVPMultiplier.;
			%put ==============================================;

			proc copy in=&sourceLib out=&targetLib noclone;
				select &dsnotcopied;
			run;
		%end;
	%end;

%mend copy_ds_to_utf8;

%copy_ds_to_utf8 /* execute the macro */

 

 

Contributors
Version history
Last update:
‎04-02-2026 01:11 PM
Updated by:

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Labels
Article Tags