About Shashank7

Shashank7 · ‎01-31-2018

Hi Rick, referring to the above example, I have a similar situation to deal with. Just that I have more than 50 vars in the dataset and all of them are categorical. I need to use one independent variable and the dependent variable in each iteration and I think leveraging macros to automate this step would be the thing to do here. If so, can you please help me as to how I should go about it? I’m having a hard time to do this. Would appreciate your thoughts. Thanks. 🙂

Shashank7 · ‎01-17-2018

Hi Reeza, I created these two variables before doing anything and then checked their skewness using proc univariate (histogram), I noticed that both of the newly created variables are extremely skewed. Tried to tansform these variables by taking log, square root etc. but the skewness is still there (although it minimized but did not make the variables normal). Should I go ahead and simply, cap these new variables (along with all other variables), treat missing values and do FA? If not, how should I transform these 2 variables to minimize their skewness and then cap them and then do FA? 🙂

Shashank7 · ‎01-17-2018

Hi, I am trying to perform k-means cluster analysis on a dataset with 20 variables and 9000 observations. I want to create 2 new variables (Usage and Payment Ratio) using 4 (Balance, Limit, Payment, Minimum Payment) of the 20 variables in the dataset. Ex: Usage = (Balance/Limit) and Payment_ratio = (payment/minimum payment). This is because of the fact that I would now have to use 2 variables in my analysis as compared to the original 4 variables. Now, should I make these new variables at the start itself, or should I first cap the outliers, remove/impute missing values and then create these 2 new variables? I tried to first clean the data (with original variables) and then created these two variables. Then I also capped the outliers and treated missing values again for these 2 variables. After doing this, these 2 new variables are skewed and I tried taking log, sqrt etc but the variables are still not normal i.e there is still skewness in these variables. Should I go ahead and do Factor Analysis taking the new skewed variables? If not, how should I transform these 2 variables to minimize their skewness? Can anyone please suggest a way to go about this? Thanks.

Online Status	Offline
Date Last Visited	‎02-24-2018 01:24 PM

Re: how to do a loop(do) in SAS

Re: New Variable creation: After or before original data cleaning?

New Variable creation: After or before original data cleaning?

Re: how to do a loop(do) in SAS

Re: New Variable creation: After or before original data cleaning?

Re: how to do a loop(do) in SAS

Re: New Variable creation: After or before original data cleaning?

New Variable creation: After or before original data cleaning?