🔒 This topic is solved and locked.
Need further help from the community? Please
sign in and ask a new question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Rick. You are right. It seems that the issue comes from my using the means of each subgroups in HAVE1 to form distance with the mean of HAVE2. So that with standardization, the mean of each subgroup essentially supposed to be zero. With unit variance.
The way to avoid this is to use the observations in the subgroubs instead of using the means. Similarly for HAVE2, using the observations instead of the mean. That is form the distance between the data sets in HAVE1 (each of the subgroups) and HAVE2 data set.
The link below does that using MATLB: that is what I wanted to do. Except that in my own case I will be forming distance between subgroups and one data set.
Thanks J
http://people.revoledu.com/kardi/tutorial/Similarity/MahalanobisDistance.html
The way to avoid this is to use the observations in the subgroubs instead of using the means. Similarly for HAVE2, using the observations instead of the mean. That is form the distance between the data sets in HAVE1 (each of the subgroups) and HAVE2 data set.
The link below does that using MATLB: that is what I wanted to do. Except that in my own case I will be forming distance between subgroups and one data set.
Thanks J
http://people.revoledu.com/kardi/tutorial/Similarity/MahalanobisDistance.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ksharp, awesome!! Thanks. You are indeed a genius.
I am very grateful for help and patience.
Thanks
Jack
I am very grateful for help and patience.
Thanks
Jack
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I think it again. You are right. It is all about Covariate matrix of mahalanobis. Which is totally different between Matlab and SAS. Eevn more the function COV() is also a little different between Matlab and SAS. If you want stick with SAS's function , see the following code and it will lead you to the similiar result with Matlab .
Data have1;
Input obs reps id $ x y;
Datalines ;
1
1
00-01
5
3
2
1
00-02
4
6
3
1
00-03
6
4
4
1
00-06
4
6
5
2
00-02
4
6
6
2
00-03
6
4
7
2
00-04
7
5
8
2
00-05
9
2
9
3
00-01
5
3
10
3
00-02
4
6
11
3
00-05
9
2
12
3
00-10
7
4
13
4
00-03
6
4
14
4
00-05
9
2
15
4
00-09
7
3
16
4
00-10
7
4
17
5
00-03
6
4
18
5
00-04
7
5
19
5
00-07
5
8
20
5
00-09
7
3
21
6
00-02
4
6
22
6
00-03
6
4
23
6
00-05
9
2
24
6
00-10
7
4
;
run;
Data have2;
Input obs reps id $ x y;
Datalines ;
25
7
00-01
5
3
26
7
00-08
1
8
27
7
00-09
7
3
28
7
00-10
7
4
;
Run;
proc iml;
use have1 nobs nobs;
read all var {reps};
read all var {x y} into data;
close;
use have2;
read all var {x y} into B;
close;
center=B[:,];
cB=cov(B);
start_end=t(loc(t(reps)^={.}||remove(reps,nobs)))||
t(loc(t(reps)^=remove(reps,1)||{.}));
want=j(nrow(start_end),2,.);
want[,1]=reps[start_end[,1]];
do i=1 to nrow(start_end);
idx=start_end[i,1]:start_end[i,2];
A=data[idx,];
mA=A[:,];
cA=cov(A);
cov=(nrow(A)/(nrow(A)+nrow(B)))#cA +
(nrow(B)/(nrow(A)+nrow(B)))#cB ;
want[i,2]=mahalanobis(mA,center,cov);
end;
create want from want[c={reps distance}];
append from want;
close;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. I prefer to stick with SAS function because I am more comfortable with SAS. Morever, With SAS function, it easier to do repetition to obtain sampling distribution. That is, it is easiet to write a program that will do compute such distance many times. The way we will have a third column containing the repl
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That is we have three variable: sampleID, replicate distance.
I am realy very grateful for your patients and assistance. I have been trying to figure this for over 3 years now. Thanks again
J
I am realy very grateful for your patients and assistance. I have been trying to figure this for over 3 years now. Thanks again
J
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sample data ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have1;input id $ Sampleid reps x y;datalines;01 1 1 2 202 1 1 2 503 1 1 6 504 1 1 7 305 1 1 4 706 1 1 6 407 1 1 5 308 1 1 4 609 1 1 2 510 1 1 1 3
21 1 2 2 222 1 2 2 523 1 2 6 524 1 2 7 325 1 2 4 726 1 2 6 427 1 2 5 328 1 2 4 629 1 2 2 530 1 2 1 3
61 1 3 2 262 1 3 2 563 1 3 6 564 1 3 7 365 1 3 4 766 1 3 6 467 1 3 5 368 1 3 4 669 1 3 2 570 1 3 1 3
91 2 4 2 292 2 4 2 593 2 4 6 594 2 4 7 395 2 4 4 796 2 4 6 497 2 4 5 398 2 4 4 699 2 4 2 5100 2 4 1 3
31 2 1 2 232 2 1 2 533 2 1 6 534 2 1 7 335 2 1 4 736 2 1 6 437 2 1 5 338 2 1 4 639 2 1 2 540 2 1 1 3
41 2 2 2 242 2 2 2 543 2 2 6 544 2 2 7 345 2 2 4 746 2 2 6 447 2 2 5 348 2 2 4 649 2 2 2 550 2 2 1 3
71 2 3 2 272 2 3 2 573 2 3 6 574 2 3 7 375 2 3 4 776 2 3 6 477 2 3 5 378 2 3 4 679 2 3 2 580 2 3 1 3
41 2 4 2 242 2 4 2 543 2 4 6 544 2 4 7 345 2 4 4 746 2 4 6 447 2 4 5 348 2 4 4 649 2 4 2 550 2 4 1 3;run;
data have2;input id $ x y;datalines;11 6 512 7 413 8 714 5 615 5 4;run;
/*Result will look like this:*/
sampleid reps distance1 1 1.4041 2 1.4041 3 1.4041 4 1.4042 1 1.4042 2 1.4042 3 1.4042 4 1.404
Each sample will have observation 1 to N; within each sample, there is reps=1 to k ; each reps will have its distance from HAVE2. each reps within reps with each sample will have their respective distance from HAVE2. its essentially the same as the program you have written so far. The only difference is that program you written so far is for sampleid=1. With real data the value the distance will be expected to be different. It's the same in this example because the samples are same.
Thanks.
21 1 2 2 222 1 2 2 523 1 2 6 524 1 2 7 325 1 2 4 726 1 2 6 427 1 2 5 328 1 2 4 629 1 2 2 530 1 2 1 3
61 1 3 2 262 1 3 2 563 1 3 6 564 1 3 7 365 1 3 4 766 1 3 6 467 1 3 5 368 1 3 4 669 1 3 2 570 1 3 1 3
91 2 4 2 292 2 4 2 593 2 4 6 594 2 4 7 395 2 4 4 796 2 4 6 497 2 4 5 398 2 4 4 699 2 4 2 5100 2 4 1 3
31 2 1 2 232 2 1 2 533 2 1 6 534 2 1 7 335 2 1 4 736 2 1 6 437 2 1 5 338 2 1 4 639 2 1 2 540 2 1 1 3
41 2 2 2 242 2 2 2 543 2 2 6 544 2 2 7 345 2 2 4 746 2 2 6 447 2 2 5 348 2 2 4 649 2 2 2 550 2 2 1 3
71 2 3 2 272 2 3 2 573 2 3 6 574 2 3 7 375 2 3 4 776 2 3 6 477 2 3 5 378 2 3 4 679 2 3 2 580 2 3 1 3
41 2 4 2 242 2 4 2 543 2 4 6 544 2 4 7 345 2 4 4 746 2 4 6 447 2 4 5 348 2 4 4 649 2 4 2 550 2 4 1 3;run;
data have2;input id $ x y;datalines;11 6 512 7 413 8 714 5 615 5 4;run;
/*Result will look like this:*/
sampleid reps distance1 1 1.4041 2 1.4041 3 1.4041 4 1.4042 1 1.4042 2 1.4042 3 1.4042 4 1.404
Each sample will have observation 1 to N; within each sample, there is reps=1 to k ; each reps will have its distance from HAVE2. each reps within reps with each sample will have their respective distance from HAVE2. its essentially the same as the program you have written so far. The only difference is that program you written so far is for sampleid=1. With real data the value the distance will be expected to be different. It's the same in this example because the samples are same.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK. Your data is not readable .so I used my original data.
SAS's function code :
Data have1;
Input Sampleid reps id $ x y;
Datalines ;
1 1 00-01 5 3
1 1 00-02 4 6
1 1 00-03 6 4
1 1 00-06 4 6
1 2 00-02 4 6
1 2 00-03 6 4
1 2 00-04 7 5
1 2 00-05 9 2
1 3 00-01 5 3
1 3 00-02 4 6
1 3 00-05 9 2
1 3 00-10 7 4
1 4 00-03 6 4
1 4 00-05 9 2
1 4 00-09 7 3
1 4 00-10 7 4
1 5 00-03 6 4
1 5 00-04 7 5
1 5 00-07 5 8
1 5 00-09 7 3
1 6 00-02 4 6
1 6 00-03 6 4
1 6 00-05 9 2
1 6 00-10 7 4
2 1 00-01 5 3
2 1 00-02 4 6
2 1 00-03 6 4
2 1 00-06 4 6
2 2 00-02 4 6
2 2 00-03 6 4
2 2 00-04 7 5
2 2 00-05 9 2
;
run;
Data have2;
Input obs reps id $ x y;
Datalines ;
25
7
00-01
5
3
26
7
00-08
1
8
27
7
00-09
7
3
28
7
00-10
7
4
;
Run;
proc iml;
use have1 nobs nobs;
read all var {Sampleid reps};
read all var {x y} into data;
close;
use have2;
read all var {x y} into B;
close;
group=catx(' ',Sampleid,reps);
center=B[:,];
cB=cov(B);
start_end=t(loc(t(group)^={' '}||remove(group,nobs)))||
t(loc(t(group)^=remove(group,1)||{' '}));
want=j(nrow(start_end),3,.);
want[,1]=Sampleid[start_end[,1]];
want[,2]=reps[start_end[,1]];
do i=1 to nrow(start_end);
idx=start_end[i,1]:start_end[i,2];
A=data[idx,];
mA=A[:,];
cA=cov(A);
cov=(nrow(A)/(nrow(A)+nrow(B)))#cA +
(nrow(B)/(nrow(A)+nrow(B)))#cB ;
want[i,3]=mahalanobis(mA,center,cov);
end;
create want from want[c={Sampleid reps distance}];
append from want;
close;
quit;
Matlab's function code :
proc iml;
use have1 nobs nobs;
read all var {Sampleid reps};
read all var {x y} into data;
close;
use have2;
read all var {x y} into B;
close;
start new_cov(x);
Xc=x-x[:,];
c=Xc`*Xc/nrow(x);
return (c);
finish;
start new_mahalanobis(A,B);
xDiff=A[:,]-B[:,];
cA=new_cov(A);
cB=new_cov(B);
pC=(nrow(A)/(nrow(A)+nrow(B)))#cA +
(nrow(B)/(nrow(A)+nrow(B)))#cB ;
d=sqrt(xDiff*inv(pC)*xDiff`);
return (d);
finish;
group=catx(' ',Sampleid,reps);
start_end=t(loc(t(group)^={' '}||remove(group,nobs)))||
t(loc(t(group)^=remove(group,1)||{' '}));
want=j(nrow(start_end),3,.);
want[,1]=Sampleid[start_end[,1]];
want[,2]=reps[start_end[,1]];
do i=1 to nrow(start_end);
idx=start_end[i,1]:start_end[i,2];
A=data[idx,];
want[i,3]=new_mahalanobis(A,B);
end;
create want from want[c={Sampleid reps distance}];
append from want;
close;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You are a savior!!! Thank you sooo much!!! Unbelievable. Speed with which write these SAS program to execute these complex task is remarkable and facinating.
Thanks again for your assistance and patience.
J
Thanks again for your assistance and patience.
J
- « Previous
-
- 1
- 2
- Next »