BookmarkSubscribeRSS Feed
Arjun_C
Calcite | Level 5

Hi Guys,

I want to get different transformations of independent variables(X variables) and find the best correlation among them with dependent variable( Y variable).

Suppose var1 is first variable..then take,,

-log(var1)

-exp(var1)

-sqrt(var1)

-square(var1) and 

-var1 (without transformation).

then find best correlation among each of these with Y variable..Repeat this for var2,var3...

 

Here age,ed,income are X variables and ln_totalspent is Y variable.(in attached dataset)

Please help me guys to automate this ....

2 REPLIES 2
emrancaan
Obsidian | Level 7

Initially correlation analysis is done then logistic regression is used for predictive analysis. Model will predict high spenders based on age and income with 95% confidence limits.

 


DATA S;
SET S;
/*representing variable ln_totalspent with binary values
IF ln_totalspent > 5.5 then it will be considered High Spender
*/
HighSpender=IFC( ln_totalspent > 5.5, 1 ,0,.);
Run;

/*data normalized*/
 proc standard data=S mean=0 std=1 out=zCrRisk; 
 var age ed income; 
 run; 

/*finding co-relation*/
 proc corr data=zCrRisk ; 
 var age ed income; 
 run;

/*principle component analysis*/

 proc princomp data=zCrRisk out=princout; 
 var age ed income; 
 run;

 proc factor data = zCrRisk method = principal rotate = quartimin score  
 mineigen=1 nfactors=2  residuals eigenvectors out=factout outstat=fact; 
 	var age ed income; 
run;

/*Predictive modeling based on key factors */

/*Spliting S dataset into Training and Testing Dataset for Modeling*/
proc surveyselect data=factout samprate=0.60 seed=201 out=jk outall  
 method=srs noprint; 
 run; 
 
 data training testing; 
 set jk; 
 if selected = 1 then output training; 
 else output testing; 
 drop selected; 
 run;


ods graphics on; 
proc fastclus data=Factout maxc=2 maxiter=10 out=clus; 
    var Factor1 Factor2 ; 
run; 
 
proc freq; 
    tables cluster*HighSpender; 
run; 
 
proc candisc ncan = 2 out=can; 
    class cluster; 
    var Factor1 Factor2; 
    title3 'Canonical Discriminant Analysis of Clusters'; 
run; 
 

proc sgplot data= can; 
    scatter y=Can2 x=Can1 / group=cluster; 
    title3 'Plot of clusters'; 
run; 
 
dm "odsresults; clear"; 


proc discrim data=Training anova all distance outstat = dis method = normal pool=yes testdata = Testing TESTout = LDA_Out  
 crossvalidate outcross = cross1 mahalanobis posterr; 
 	priors equal ; 
 	class HighSpender; 
 	var Factor1 Factor2 ; 
run;

ods graphics on; 
proc logistic data=TESTING outest=betas covout plots(only)=(roc(id=obs) effect); 
  model HighSpender(event = '1') = age ed income 
 					/ selection=stepwise 
                      slentry=0.3 
                      slstay=0.35 
                      details 
                      lackfit; 
  output out=pred p=phat lower=lcl upper=ucl 
         predprob=(individual crossvalidate); 
run; 
ods graphics off; 
PaigeMiller
Diamond | Level 26

Look at the BOXCOX transformation of PROC TRANSREG. Repeat for each Y variable.

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 794 views
  • 0 likes
  • 3 in conversation