About rtaylor

rtaylor · ‎08-05-2016

thanks.

rtaylor · ‎08-05-2016

A variable in a dataset is of type numeric, but shows a value 'C' when viewed or printed. Format is best12. SAS Output: 10 21 27 12 C 10 16 What does it mean? Is it missing? How do I exclude such values?

rtaylor · ‎03-11-2015

Thank you Dr. Wicklin, I indeed meant the last interpretation. The goal is to find observations that are similar to each other, considering redundancy among variables as well importance from a substantive perspective. From a data-reduction view point, distance using first 4-5 scores could be used, which would be equivalent to assigning zero weight to the rest of the variable. Proportion of variance seems more justifiable. In the end, the two may result in fairly similar values given that the first five factors explain over 80% of variance. I have not seen any examples of weighting. Are there concrete examples that you could point me to.

rtaylor · ‎03-10-2015

Thanks Xia, I have already computed Mahalanobis distance - as the Euclidean distance with principal components. My questions pertains to the use of the 'weight' option in 'var' statement in PROC DISTANCE. Particularly, I am trying to find out if it is possible and reasonable to 'weight' the first principal component more heavily.

rtaylor · ‎03-07-2015

Background: I need to compute Mahalanobis distance for a dataset with about 200 observations and 17 variables. The goal is to identify 10 closest observations for each row. Variables have very different scales, and there are no missing values. Variables are of un-equal 'importance'. As recommended in: 30662 - Mahalanobis distance: from each observation to the mean, from each observation to a specific observation, between all possible pairs, I used Proc PRINCOMP with std option and used prin1:prin17 for computing Euclidean distance. As expected first three components accounted for most of the variance. In particular, the first two were correlated highly with the key variables. It seems that the weight option in the var statement could be used to assign greater importance to the first three components. However, there does not seem to be a rational basis for picking reasonable weight values. Any feedback, suggestions will be greatly appreciated. Thanks. RT

Online Status	Offline
Date Last Visited	‎08-05-2016 09:14 AM

Re: Numeric column in a dataset has "C" as a value

Numeric column in a dataset has "C" as a value

Re: Using weight variable in Proc Distance

Re: Using weight variable in Proc Distance

Using weight variable in Proc Distance

Re: Numeric column in a dataset has "C" as a value

Numeric column in a dataset has "C" as a value

Re: Using weight variable in Proc Distance

Re: Using weight variable in Proc Distance

Using weight variable in Proc Distance