Hi Guys,
I want to get different transformations of independent variables(X variables) and find the best correlation among them with dependent variable( Y variable).
Suppose var1 is first variable..then take,,
-log(var1)
-exp(var1)
-sqrt(var1)
-square(var1) and
-var1 (without transformation).
then find best correlation among each of these with Y variable..Repeat this for var2,var3...
Here age,ed,income are X variables and ln_totalspent is Y variable.(in attached dataset)
Please help me guys to automate this ....
Initially correlation analysis is done then logistic regression is used for predictive analysis. Model will predict high spenders based on age and income with 95% confidence limits.
DATA S;
SET S;
/*representing variable ln_totalspent with binary values
IF ln_totalspent > 5.5 then it will be considered High Spender
*/
HighSpender=IFC( ln_totalspent > 5.5, 1 ,0,.);
Run;
/*data normalized*/
proc standard data=S mean=0 std=1 out=zCrRisk;
var age ed income;
run;
/*finding co-relation*/
proc corr data=zCrRisk ;
var age ed income;
run;
/*principle component analysis*/
proc princomp data=zCrRisk out=princout;
var age ed income;
run;
proc factor data = zCrRisk method = principal rotate = quartimin score
mineigen=1 nfactors=2 residuals eigenvectors out=factout outstat=fact;
var age ed income;
run;
/*Predictive modeling based on key factors */
/*Spliting S dataset into Training and Testing Dataset for Modeling*/
proc surveyselect data=factout samprate=0.60 seed=201 out=jk outall
method=srs noprint;
run;
data training testing;
set jk;
if selected = 1 then output training;
else output testing;
drop selected;
run;
ods graphics on;
proc fastclus data=Factout maxc=2 maxiter=10 out=clus;
var Factor1 Factor2 ;
run;
proc freq;
tables cluster*HighSpender;
run;
proc candisc ncan = 2 out=can;
class cluster;
var Factor1 Factor2;
title3 'Canonical Discriminant Analysis of Clusters';
run;
proc sgplot data= can;
scatter y=Can2 x=Can1 / group=cluster;
title3 'Plot of clusters';
run;
dm "odsresults; clear";
proc discrim data=Training anova all distance outstat = dis method = normal pool=yes testdata = Testing TESTout = LDA_Out
crossvalidate outcross = cross1 mahalanobis posterr;
priors equal ;
class HighSpender;
var Factor1 Factor2 ;
run;
ods graphics on;
proc logistic data=TESTING outest=betas covout plots(only)=(roc(id=obs) effect);
model HighSpender(event = '1') = age ed income
/ selection=stepwise
slentry=0.3
slstay=0.35
details
lackfit;
output out=pred p=phat lower=lcl upper=ucl
predprob=(individual crossvalidate);
run;
ods graphics off;
Look at the BOXCOX transformation of PROC TRANSREG. Repeat for each Y variable.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.