Hello Everyone,
Can anyone help me to find out how to calculate Mahalanobis distance with two samples?
What do you mean by "two samples"? Two different data sets? One data set and two different variables? One data set, one variable with two groups indicated by a different variable?
Perhaps you should include a small sample of the data that you currently have as working data step code so we don't have to guess.
Thank you for your response. I have two datasets: CIREN and CISS. We are trying to find out how similar CIREN data is to CISS. CISS is a random sample of the whole country's database. CIREN is a particular kind of crash. The variables we are using are Age, weight, gender, height, ISS score, and AIS score. I know how to calculate the Mahalanobis distance in one dataset. I am trying to find out what is going to be the equation if we have two datasets.
You need to append both datasets CIREN and CISS.
Do not forget to make an extra column (named "SourceDS" $15) with two distinct values : "row from CIREN" and "row from CISS".
Then do a canonical discriminant analysis to profile CIREN versus CISS (as if you would analyze heterogeneity between clusters and homogeneity within clusters).
proc candisc data=MyData out=outcan distance anova;
class SourceDS;
var Age weight gender height ISS_score and AIS_score;
run;
BR, Koen
For Mahalanobis distance, see here :
Koen
Thank you so much
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.