## how to get best correlation among different transformations of a variable

I want to get different transformations of independent variables(X variables) and find the best correlation among them with dependent variable( Y variable).

Suppose var1 is first variable..then take,,

-log(var1)

-exp(var1)

-sqrt(var1)

-square(var1) and

-var1 (without transformation).

then find best correlation among each of these with Y variable..Repeat this for var2,var3...

Here age,ed,income are X variables and ln_totalspent is Y variable.(in attached dataset)

## Re: how to get best correlation among different transformations of a variable

Initially correlation analysis is done then logistic regression is used for predictive analysis. Model will predict high spenders based on age and income with 95% confidence limits.

``````
DATA S;
SET S;
/*representing variable ln_totalspent with binary values
IF ln_totalspent > 5.5 then it will be considered High Spender
*/
HighSpender=IFC( ln_totalspent > 5.5, 1 ,0,.);
Run;

/*data normalized*/
proc standard data=S mean=0 std=1 out=zCrRisk;
var age ed income;
run;

/*finding co-relation*/
proc corr data=zCrRisk ;
var age ed income;
run;

/*principle component analysis*/

proc princomp data=zCrRisk out=princout;
var age ed income;
run;

proc factor data = zCrRisk method = principal rotate = quartimin score
mineigen=1 nfactors=2  residuals eigenvectors out=factout outstat=fact;
var age ed income;
run;

/*Predictive modeling based on key factors */

/*Spliting S dataset into Training and Testing Dataset for Modeling*/
proc surveyselect data=factout samprate=0.60 seed=201 out=jk outall
method=srs noprint;
run;

data training testing;
set jk;
if selected = 1 then output training;
else output testing;
drop selected;
run;

ods graphics on;
proc fastclus data=Factout maxc=2 maxiter=10 out=clus;
var Factor1 Factor2 ;
run;

proc freq;
tables cluster*HighSpender;
run;

proc candisc ncan = 2 out=can;
class cluster;
var Factor1 Factor2;
title3 'Canonical Discriminant Analysis of Clusters';
run;

proc sgplot data= can;
scatter y=Can2 x=Can1 / group=cluster;
title3 'Plot of clusters';
run;

dm "odsresults; clear";

proc discrim data=Training anova all distance outstat = dis method = normal pool=yes testdata = Testing TESTout = LDA_Out
crossvalidate outcross = cross1 mahalanobis posterr;
priors equal ;
class HighSpender;
var Factor1 Factor2 ;
run;

ods graphics on;
proc logistic data=TESTING outest=betas covout plots(only)=(roc(id=obs) effect);
model HighSpender(event = '1') = age ed income
/ selection=stepwise
slentry=0.3
slstay=0.35
details
lackfit;
output out=pred p=phat lower=lcl upper=ucl
predprob=(individual crossvalidate);
run;
ods graphics off; ``````
## Re: how to get best correlation among different transformations of a variable

Look at the BOXCOX transformation of PROC TRANSREG. Repeat for each Y variable.

--
Paige Miller
