BookmarkSubscribeRSS Feed
jmguzman
Fluorite | Level 6

Hi all,

 

I simply want to use Box-Cox to transform a single variable. However, most of the code I've seen requires 2 variables (dependent & independent) and does not save the transformed variable. The "Univariate Box-Cox" seems most relevant, but I can't understand how to run it. This is the SAS example below:

 

title 'Univariate Box-Cox';

data x;
   call streaminit(17);
   z = 0;
   do i = 1 to 500;
      y = rand('lognormal');
      output;
   end;
run;

proc transreg maxiter=0 nozeroconstant;
   model BoxCox(y) = identity(z);
   output;
run;

proc univariate noprint;
   histogram y ty;
run;

 

I know that "x" is going to be my dataset and "z" is the constant (0), but I can't figure out exactly what to do with the variable I want transformed (y). I don't understand how putting the variable I want transformed in the "y" place in the "y = rand('lognormal')" code works. My original (y) variable values are just going to be overwritten by the = rand('lognormal'). I ran the code below and got values that are exactly the same as if I just made up a variable in the "y = rand('lognormal') portion.

Data newdataset;
set olddataset;

call streaminit(17);
z = 0;
do i = 1 to 500;
myvariable = rand('lognormal');
output;
end;
run;

proc transreg Data = newdataset maxiter=0 nozeroconstant;
model BoxCox(myvariable) = identity(z);
output;
run;

 

I would greatly appreciate any guidance!

 

Best,

Joe

 

15 REPLIES 15
Reeza
Super User
Where did you find that example? SAS examples usually have comments or some explanation around them.

The first step, that creates the dataset X is only to create sample data for the example. You would replace that with your own data source and start at the transreg step.

proc transreg data=olddataset nozeroconstant;
model boxcox(myVariable) = identify(z);
output out=want;
run;

Check the want data set to see what you get.
sbxkoenk
SAS Super FREQ

Hello @Reeza ,

 

I think the example from @jmguzman was based on :

SAS 9.4 / Viya 3.5
SAS/STAT 15.2 User's Guide
The TRANSREG Procedure

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_transreg_details02.htm

 

See the section that starts with this :

The next example shows how to find a Box-Cox transformation without an independent variable. This seeks to normalize the univariate histogram. This example generates 500 random observations from a lognormal distribution. In addition, a constant variable z is created that is all zero. This is because PROC TRANSREG requires some independent variable to be specified, even if it is constant. Two options are specified in the PROC TRANSREG statement. MAXITER=0 is specified because the Box-Cox transformation is performed before any iterations are begun. No iterations are needed since no other work is required. The NOZEROCONSTANT a-option (which can be abbreviated NOZ) is specified so that PROC TRANSREG does not print any warnings when it encounters the constant independent variable. The MODEL statement asks for a Box-Cox transformation of y and an IDENTITY transformation (which does nothing) of the constant variable z. Finally, PROC UNIVARIATE is run to show a histogram of the original variable y, and the Box-Cox transformation, Ty. The following statements fit the univariate Box-Cox model and produce Figure 18:

 

Cheers,

Koen

WarrenKuhfeld
Ammonite | Level 13

I wrote that example. Sorry if you were confused. As @Reeza said, use your own data not my sample data. Often times in examples, SAS developers use real data. Sometimes, as I did this time, it is convenient to make artificial and deliberately contrived data that clearly show what the analysis is doing. I don't recall how many years ago I added this capability to TRANSREG, but it is good to hear that someone wants to use it.

jmguzman
Fluorite | Level 6

Hello @WarrenKuhfeld ,

 

No worries at all -- I'm glad this code exists. I can't seem to figure out how to run the PROC TRANSREG "Univariate Box-Cox. Does the code below look correct? I keep getting an error.

 

proc transreg data=olddataset maxiter=0 nozeroconstant;
     model boxcox(myvariable) = identify(z);
     output out=.want;
     run;

 

Thank you so much!

jmguzman
Fluorite | Level 6

Hi @Reeza 

 

I appreciate the help! However, I ran the code and got the error below: 
 
ERROR: The transformation identify is not valid.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WANT may be incomplete. When this step was stopped there were 0 observations and 0 variables.
WarrenKuhfeld
Ammonite | Level 13

IDENTITY not IDENTIFY.

jmguzman
Fluorite | Level 6

 

Great catch! Now I get the error below. I think it's because of missing values? What can I do to exclude missing values?

 

ERROR: 114 invalid values were encountered while attempting to transform variable myvariable.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 239 observations read from the data set olddataset.
WARNING: The data set WANT may be incomplete. When this step was stopped there were 0 observations and 8 variables.
WARNING: Data set WANT was not replaced because this step was stopped.
WarrenKuhfeld
Ammonite | Level 13

There are some things you need to understand about TRANSREG. I designed it for linear models with nonlinear (sometimes iteratively derived) transformations. However, I later made it do other things that seemed important that no one else was doing. Hence, I added BOXCOX. Univariate BOXCOX was even more of a stretch, but someone needed to do it, so I made it work (albeit with a somewhat clunky specification). Similar to BOXCOX, I added smoothing splines even though they did not fit well with the rest of the procedure. Decades ago, the old sasware ballot had getting the smoothing splines from GPLOT into an output data set. I knew GPLOT would never do it, so I put it in TRANSREG. The documentation has lots of details that can help you understand what it does. In some ways it is like a Swiss Army knife—lots of functionality but not always maximally elegant for a specific tool.

 

jmguzman
Fluorite | Level 6

Hi @WarrenKuhfeld ,

 

Thank you for trailblazing! I'll take a look at the links you provided and read through the documentation to help me troubleshoot.

 

Have a great night!

Rick_SAS
SAS Super FREQ

There are several papers about the univariate Box-Cox transformation. See

 

Coleman, 2004, "A Fast, High-Precision Implementation of the Univariate One-Parameter Box-Cox
Transformation Using the Golden Section Search in SAS/IML", NESUG proceedings


LaLonde, 2012, "Transforming Variables for Normality and Linearity–When, How, Why and Why Not's", SAS Global Forum proceedings.

 

KSharp: You have asked me to write about the Box-Cox transformation several times. What is it you find confusing? What do you want me to say about this topic?

Ksharp
Super User
Rick,
Sorry. I don't remember "asked me to write about the Box-Cox transformation several times."
Actually , I don't need Box-Cox in my real work . I recall you wrote a blog about Fisher transformation by arctanh(r) .

https://blogs.sas.com/content/iml/2017/09/20/fishers-transformation-correlation.html

Therefore, I think you know something about this topic .
Rick_SAS
SAS Super FREQ

Yes, and in a comment to that article you wrote (September 27, 2017)
Rick,
Can you write a blog about Box-Cox Transformation?

So, I am asking you what you want to know that is not explained in the TRANSREG doc or in the papers that I've cited?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 2194 views
  • 5 likes
  • 6 in conversation