Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 12-12-2014 06:08 AM
(1156 views)

Suppose I have data set that look like this:

data A;

input x y;

datalines;

23 65

34 13

32 54

43 32

65 21

64 34

;

run;

data B;

input x y;

datalines;

23 43

17 84

;

run;

Let the dataset “A” has 6 observations and a data set “B” has 2 observations. X and Y are variables in these two data sets. I like to

I) form the SUB subsets from A which is the combination of 6 things taken 2 at a time (6!/(4!2!)=15 distint subsets).

II) calculated the malahanobis distance between the 2 observations in B and each of the 15 subsets of A in terms of X and Y.

III) select the subset from A with the smallest distance—call this subset MINDIST.

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The Mahalanobis distance assumes a center and covariance matrix for the computation. Please show how you want to compute those parameters.

Since this pocess is reminiscent of robust regression methods, you might want to see whether SAS/IML (or PROC ROBUSTREG) already contains the algorithm that you want. See the description of the MCD and MVE algorithms.

If you decide to proceed "by hand," the relevant functions you will need are

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the prompt reply Rick.

given two covariate **xi** and **xj**, the desired formaula for mahalanobis distance md={[**xi**-**xj**]^TS^(-1)[**xi**-**xj**]}^(1/2), where S^1/2 is the cholesky decomposition of **X** covariance matrix. In the above example xi=x;xj=y.

Please, does the robustreg procedure allow for computing the distance between variables other than continuous, since that requires scoring? I will appreciate a demonstration of how to output the subset with minimum distance.

Thanks,

Jack

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Can you explain more about Step 3? Each subset of A contains two observations. For concreteness, let the first subset be

the points

23 65

34 13

I compute the Mahalanobis distances (MD) to each point in B. I get

MD to B[1,]:

1.34

1.54

MD to B[2,]:

0.99

3.80

Then what? What do you consider to be "the distance between the two observations in B and the subset of A"?

Here's some code:

proc iml;

use A; read all var _num_ into Z; close;

cov = cov(Z);

/* let m be first subset = first 2 obs */

m = Z[1:2,];

use B;

do data;

read next var _num_ into center;

md = mahalanobis(m, center, cov); /* distance from each pt of A to each pt of B */

print md;

end;

close B;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks Rick!

In step 2, I will compute the mahalanobis distance between all the subsets in A with the data set in B.

In step 3, I want to identify the subsets of A of all the 15 distinct possible subset that has the minimum mahalanobis distance. Essentially, the subset of A that has the minimum mahalanobis distance with B will be the most closely match with B. In order, words I am looking to select the subset of A that best match B.

The aim of the whole exercise is to enumerate all distinct subsets A based on the number of observations in B. Then of all the subset of A, which of the subsets is most similar (closest) to B in based on mahalanobis distance. In the example given, data set A has 6 observations and data set B has 2 observations. There are 6 choose 2 (15 subsets) distinct ways of enumerating data set with 6 observations chosen 2 at a time. But I want to identify and select the subset among all the 15 possible subsets of A that is most similar to B. I want to measure the similarity based on mahalanobis distance.This is similar to identifying the subset in A that is the best match for B.

*Alternatively, I can take each observation from B and find the observation in A that most closely matched using mahalanobis distance.*

Thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.