About Rgangane

DougWielenga · ‎10-29-2018

Thanks Doug, do you also Standardize/Notmalize flag variables before using them in PROc factor? There are scenarios when you might consider standardizing/normalizing variables and scenarios where you might not. You just need to think about how the interpretations differ and decide which one makes more sense for your research question. It is easy enough to run Factor Analysis both ways, one using the covariance matrix as the input and one using the correlation matrix as the input to get solutions using the raw data (covariance matrix) or standardized/normalized data (correlation matrix). This same question about whether to standardize/normalize can be found when doing Principal Components Analysis (PCA) and predictive modeling. In the end, methods which attempt to explain as much variability will be more influenced by variables with a larger amount of variability. It doesn't inherently makes sense to weight a variable more heavily just because you altered the measurement units (e.g. from miles to inches), but neither does it makes sense to normalize variables which all have (theoretically) the same scale (e.g. survey questions which measure strength of response). In the case of a survey where people are expressing their strength of agreement/satisfaction on some scale, it is natural that some questions will have greater variability than others and this could happen for a variety of reasons. * some questions were poorly worded * some areas were much more problematic than others * the focus group itself has certain biases which might differ from the population If my survey is trying to assess which factors are most important to the focus group, standardizing/normalizing the strength of agreement/satisfaction variables makes no sense because you are trying to identify what factors matter. Normalizing in this situation effectively weighs every question equally regardless of how little variability the question represented. On the other hand, if my goal is to create an overall metric that I plan to evaluate over time, failing to standardize makes the resulting scores less comparable since each question is potentially providing a different amount of influence on each solution. In many cases, it might be that the survey instrument changes in response to reviewing results where certain questions had very little variability. If my data is not naturally on the same scale (e.g. cost of a car in $, horsepower of a car, mileage of a car), then standardizing typically will make more sense but you also need to consider if your data represents the full range of values you want to consider or a narrow band of the population? If I have certain variables which vary over a tiny portion of the population of interest while other variables span a large proportion of the population, standardizing effectively makes the variable which only varies over a tiny proportion of the range for the population of interest much more important giving it the same weight as variables which contain data much more representative of the whole population. Neither approach is wrong, they just need to be interpreted differently, and considering which type of interpretation is of greater interest should help you decided whether or not to standardize and how to prepare your data as a result. Hope this helps! Doug

Online Status	Offline
Date Last Visited	‎10-26-2018 05:28 PM

Re: Clustering binary data with Enterprise Miner

Re: Clustering binary data with Enterprise Miner