In running SAS 9.4 M5 (LIN X64) I am seeing a different behavior than I remember in earlier 9.4 maintenance releases with respect to use of the CVP (character variable padding) engine. The way I remember it, it used to be that if the CVP engine was specified and the CVPMULTIPLIER was not, character variables would be expanded by a factor of 1.5 on reading the data, so that character data could be transcoded, for example, from latin1 to utf8. To create a data set with these larger utf8 character strings, one could PROC COPY with the NOCLONE option or use a DATA step.
libname lat CVP "/path";
libname utf8 "/different/path";
proc copy in=lat out=utf8 noclone;
run;
-or-
data utf8.onedset;
set lat.onedset;
run;
However, I am now running SAS in 9.5 M5 and I am seeing something a little different.
In the doc it says this:
If you explicitly specify the CVP engine but do not specify either the CVPMULTIPLIER= option or the CVPBYTES= option,
then SAS uses CVPMULTIPLIER=AUTO(0) to increase the lengths.
AUTO(0) sets the value of the CVP engine based on the encoding of the SAS session and input data set.
However, I can't find anything in the doc to tell me exactly how the value is determined (also, I think that this should say "the value of the CVPMULTIPLIER" rather than "the value of the CVP engine"). That would seem to me to be an important thing to know, since as far as I can tell, it represents a change in the behavior of the CVP engine, although in my environment the 1.5 multiplier is still used. But I would like to know in which situations some other value would be used automatically.
I checked in the What's New section of the NLS Guide, but didn't find any mention of this. Perhaps it was in an earlier maintenance release and I didn't see it. Does anyone know?
Also, in my searching I see that there is a macro %COPY_TO_NEW_ENCODING documented. I'm not sure if it is new in Maint 5 also. It appears to be able to replace the use of the CVP engine by examining the data and only increasing the length of the variables which need it based on the actual values. Possibly high resource uses, either way. Does anyone out there have experience with this macro? Any pros or cons you can mention?
Thanks!
Oh, and if this is the wrong forum for this question, please feel free to suggest a different one. I did not see one dedicated to NLS questions.
Donna Dutton
> However, I am now running SAS in 9.5 M5 and I am seeing something a little different.
What do you see? Is it just the doc that's different?
One way to know what the default is would be to create a field with around half of the characters single byte and half double-byte, like AAAÈÈÈ.
This should use 6 bytes in latin and 9 bytes in UTF8. See how the truncation happens.
If no one answers here, I guess you best bet in then to contact Tech Support. Please report here, I am curious now.
Note that the page you linked to contains:
By default, the CVP engine uses a multiplier of 1.5 times the variable length.
In my LIN x64 UTF8 environment, I see the same behavior as before: a multiplier of 1.5 is used. So apparently AUTO(0) is 1.5 in this environment. But I wonder in which environments and under what situations it would be different from that. Of course it can be controlled by specifying an amount in the option, but I'm curious too. I guess it's a question for Tech Support.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.