Hi Miguel, Thank you for your reply! I had been waiting on someone to answer. But, given a set of census data(400 interval variable) and financial data(another 400 interval variable) how do I find the correlation within these variables ? Will correlation as the first screening help ? or should I directly start on with decision tree / GBM models? I can calculate spearman and hoeffding coefficients and also the VIF factor, but all that comes later once I run the model. How do I start of with initial screening? It would be very good if I could screen them using pearson correlation statistic... , but it would give me a matrix with 400 rows and 400 columns Also, I did Variable clustering. I select one variable which has the least 1-R^2 in each cluster. Would that work either? I don't now if I am thinking in the right way. Any help would be appreciated. Thanks, Minal
