DATA Step, Macro, Functions and more

how to get best correlation among different transformations of a variable

Reply
Frequent Learner
Posts: 1

how to get best correlation among different transformations of a variable

Hi Guys,

I want to get different transformations of independent variables(X variables) and find the best correlation among them with dependent variable( Y variable).

Suppose var1 is first variable..then take,,

-log(var1)

-exp(var1)

-sqrt(var1)

-square(var1) and 

-var1 (without transformation).

then find best correlation among each of these with Y variable..Repeat this for var2,var3...

 

Here age,ed,income are X variables and ln_totalspent is Y variable.(in attached dataset)

Please help me guys to automate this ....

Attachment
Contributor
Posts: 33

Re: how to get best correlation among different transformations of a variable

Initially correlation analysis is done then logistic regression is used for predictive analysis. Model will predict high spenders based on age and income with 95% confidence limits.

 


DATA S;
SET S;
/*representing variable ln_totalspent with binary values
IF ln_totalspent > 5.5 then it will be considered High Spender
*/
HighSpender=IFC( ln_totalspent > 5.5, 1 ,0,.);
Run;

/*data normalized*/
 proc standard data=S mean=0 std=1 out=zCrRisk; 
 var age ed income; 
 run; 

/*finding co-relation*/
 proc corr data=zCrRisk ; 
 var age ed income; 
 run;

/*principle component analysis*/

 proc princomp data=zCrRisk out=princout; 
 var age ed income; 
 run;

 proc factor data = zCrRisk method = principal rotate = quartimin score  
 mineigen=1 nfactors=2  residuals eigenvectors out=factout outstat=fact; 
 	var age ed income; 
run;

/*Predictive modeling based on key factors */

/*Spliting S dataset into Training and Testing Dataset for Modeling*/
proc surveyselect data=factout samprate=0.60 seed=201 out=jk outall  
 method=srs noprint; 
 run; 
 
 data training testing; 
 set jk; 
 if selected = 1 then output training; 
 else output testing; 
 drop selected; 
 run;


ods graphics on; 
proc fastclus data=Factout maxc=2 maxiter=10 out=clus; 
    var Factor1 Factor2 ; 
run; 
 
proc freq; 
    tables cluster*HighSpender; 
run; 
 
proc candisc ncan = 2 out=can; 
    class cluster; 
    var Factor1 Factor2; 
    title3 'Canonical Discriminant Analysis of Clusters'; 
run; 
 

proc sgplot data= can; 
    scatter y=Can2 x=Can1 / group=cluster; 
    title3 'Plot of clusters'; 
run; 
 
dm "odsresults; clear"; 


proc discrim data=Training anova all distance outstat = dis method = normal pool=yes testdata = Testing TESTout = LDA_Out  
 crossvalidate outcross = cross1 mahalanobis posterr; 
 	priors equal ; 
 	class HighSpender; 
 	var Factor1 Factor2 ; 
run;

ods graphics on; 
proc logistic data=TESTING outest=betas covout plots(only)=(roc(id=obs) effect); 
  model HighSpender(event = '1') = age ed income 
 					/ selection=stepwise 
                      slentry=0.3 
                      slstay=0.35 
                      details 
                      lackfit; 
  output out=pred p=phat lower=lcl upper=ucl 
         predprob=(individual crossvalidate); 
run; 
ods graphics off; 
Respected Advisor
Posts: 2,658

Re: how to get best correlation among different transformations of a variable

Look at the BOXCOX transformation of PROC TRANSREG. Repeat for each Y variable.

--
Paige Miller
Ask a Question
Discussion stats
  • 2 replies
  • 197 views
  • 0 likes
  • 3 in conversation