Hi Guys,
I want to get different transformations of independent variables(X variables) and find the best correlation among them with dependent variable( Y variable).
Suppose var1 is first variable..then take,,
-log(var1)
-exp(var1)
-sqrt(var1)
-square(var1) and
-var1 (without transformation).
then find best correlation among each of these with Y variable..Repeat this for var2,var3...
Here age,ed,income are X variables and ln_totalspent is Y variable.(in attached dataset)
Please help me guys to automate this ....
Initially correlation analysis is done then logistic regression is used for predictive analysis. Model will predict high spenders based on age and income with 95% confidence limits.
DATA S;
SET S;
/*representing variable ln_totalspent with binary values
IF ln_totalspent > 5.5 then it will be considered High Spender
*/
HighSpender=IFC( ln_totalspent > 5.5, 1 ,0,.);
Run;
/*data normalized*/
proc standard data=S mean=0 std=1 out=zCrRisk;
var age ed income;
run;
/*finding co-relation*/
proc corr data=zCrRisk ;
var age ed income;
run;
/*principle component analysis*/
proc princomp data=zCrRisk out=princout;
var age ed income;
run;
proc factor data = zCrRisk method = principal rotate = quartimin score
mineigen=1 nfactors=2 residuals eigenvectors out=factout outstat=fact;
var age ed income;
run;
/*Predictive modeling based on key factors */
/*Spliting S dataset into Training and Testing Dataset for Modeling*/
proc surveyselect data=factout samprate=0.60 seed=201 out=jk outall
method=srs noprint;
run;
data training testing;
set jk;
if selected = 1 then output training;
else output testing;
drop selected;
run;
ods graphics on;
proc fastclus data=Factout maxc=2 maxiter=10 out=clus;
var Factor1 Factor2 ;
run;
proc freq;
tables cluster*HighSpender;
run;
proc candisc ncan = 2 out=can;
class cluster;
var Factor1 Factor2;
title3 'Canonical Discriminant Analysis of Clusters';
run;
proc sgplot data= can;
scatter y=Can2 x=Can1 / group=cluster;
title3 'Plot of clusters';
run;
dm "odsresults; clear";
proc discrim data=Training anova all distance outstat = dis method = normal pool=yes testdata = Testing TESTout = LDA_Out
crossvalidate outcross = cross1 mahalanobis posterr;
priors equal ;
class HighSpender;
var Factor1 Factor2 ;
run;
ods graphics on;
proc logistic data=TESTING outest=betas covout plots(only)=(roc(id=obs) effect);
model HighSpender(event = '1') = age ed income
/ selection=stepwise
slentry=0.3
slstay=0.35
details
lackfit;
output out=pred p=phat lower=lcl upper=ucl
predprob=(individual crossvalidate);
run;
ods graphics off;
Look at the BOXCOX transformation of PROC TRANSREG. Repeat for each Y variable.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.