Hi all,
I simply want to use Box-Cox to transform a single variable. However, most of the code I've seen requires 2 variables (dependent & independent) and does not save the transformed variable. The "Univariate Box-Cox" seems most relevant, but I can't understand how to run it. This is the SAS example below:
title 'Univariate Box-Cox'; data x; call streaminit(17); z = 0; do i = 1 to 500; y = rand('lognormal'); output; end; run; proc transreg maxiter=0 nozeroconstant; model BoxCox(y) = identity(z); output; run; proc univariate noprint; histogram y ty; run;
I know that "x" is going to be my dataset and "z" is the constant (0), but I can't figure out exactly what to do with the variable I want transformed (y). I don't understand how putting the variable I want transformed in the "y" place in the "y = rand('lognormal')" code works. My original (y) variable values are just going to be overwritten by the = rand('lognormal'). I ran the code below and got values that are exactly the same as if I just made up a variable in the "y = rand('lognormal') portion.
Data newdataset;
set olddataset;
call streaminit(17);
z = 0;
do i = 1 to 500;
myvariable = rand('lognormal');
output;
end;
run;
proc transreg Data = newdataset maxiter=0 nozeroconstant;
model BoxCox(myvariable) = identity(z);
output;
run;
I would greatly appreciate any guidance!
Best,
Joe
Hello @Reeza ,
I think the example from @jmguzman was based on :
SAS 9.4 / Viya 3.5
SAS/STAT 15.2 User's Guide
The TRANSREG Procedure
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_transreg_details02.htm
See the section that starts with this :
The next example shows how to find a Box-Cox transformation without an independent variable. This seeks to normalize the univariate histogram. This example generates 500 random observations from a lognormal distribution. In addition, a constant variable z is created that is all zero. This is because PROC TRANSREG requires some independent variable to be specified, even if it is constant. Two options are specified in the PROC TRANSREG statement. MAXITER=0 is specified because the Box-Cox transformation is performed before any iterations are begun. No iterations are needed since no other work is required. The NOZEROCONSTANT a-option (which can be abbreviated NOZ) is specified so that PROC TRANSREG does not print any warnings when it encounters the constant independent variable. The MODEL statement asks for a Box-Cox transformation of y and an IDENTITY transformation (which does nothing) of the constant variable z. Finally, PROC UNIVARIATE is run to show a histogram of the original variable y, and the Box-Cox transformation, Ty. The following statements fit the univariate Box-Cox model and produce Figure 18:
Cheers,
Koen
I wrote that example. Sorry if you were confused. As @Reeza said, use your own data not my sample data. Often times in examples, SAS developers use real data. Sometimes, as I did this time, it is convenient to make artificial and deliberately contrived data that clearly show what the analysis is doing. I don't recall how many years ago I added this capability to TRANSREG, but it is good to hear that someone wants to use it.
Hello @WarrenKuhfeld ,
No worries at all -- I'm glad this code exists. I can't seem to figure out how to run the PROC TRANSREG "Univariate Box-Cox. Does the code below look correct? I keep getting an error.
Thank you so much!
Hi @Reeza
IDENTITY not IDENTIFY.
Great catch! Now I get the error below. I think it's because of missing values? What can I do to exclude missing values?
Do you have negative values or more generally nonpositive? See PARAMETER=.
Also see NOMISS.
There are some things you need to understand about TRANSREG. I designed it for linear models with nonlinear (sometimes iteratively derived) transformations. However, I later made it do other things that seemed important that no one else was doing. Hence, I added BOXCOX. Univariate BOXCOX was even more of a stretch, but someone needed to do it, so I made it work (albeit with a somewhat clunky specification). Similar to BOXCOX, I added smoothing splines even though they did not fit well with the rest of the procedure. Decades ago, the old sasware ballot had getting the smoothing splines from GPLOT into an output data set. I knew GPLOT would never do it, so I put it in TRANSREG. The documentation has lots of details that can help you understand what it does. In some ways it is like a Swiss Army knife—lots of functionality but not always maximally elegant for a specific tool.
Hi @WarrenKuhfeld ,
Thank you for trailblazing! I'll take a look at the links you provided and read through the documentation to help me troubleshoot.
Have a great night!
calling @Rick_SAS
There are several papers about the univariate Box-Cox transformation. See
Coleman, 2004, "A Fast, High-Precision Implementation of the Univariate One-Parameter Box-Cox
Transformation Using the Golden Section Search in SAS/IML", NESUG proceedings
LaLonde, 2012, "Transforming Variables for Normality and Linearity–When, How, Why and Why Not's", SAS Global Forum proceedings.
KSharp: You have asked me to write about the Box-Cox transformation several times. What is it you find confusing? What do you want me to say about this topic?
Yes, and in a comment to that article you wrote (September 27, 2017)
Rick,
Can you write a blog about Box-Cox Transformation?
So, I am asking you what you want to know that is not explained in the TRANSREG doc or in the papers that I've cited?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.