Re: Distance Compution

SWEETSAS · Posted 11-01-2018 02:16 AM

The following SAS codes work fine, but after computing the MD between each of the replicates in HAVE1 with the data set HAVE2, I want the variable INDEX to be part of the final data set. That is, I think the last few lines of code should look like this:

create want from want[c={index reps distance}];
append from want;

Any help will be appreciated. Thanks for your help.

Jack

data HAVE1;

input index reps ID X Y;

datalines;


1	1	00-01	5	3
1	1	00-02	4	6
1	1	00-03	6	4
1	1	00-06	4	6
1	2	00-02	4	6
1	2	00-03	6	4
1	2	00-04	7	5
1	2	00-05	9	2
1	3	00-01	5	3
1	3	00-02	4	6
1	3	00-05	9	2
1	3	00-10	7	4
1	4	00-03	6	4
1	4	00-05	9	2
1	4	00-09	7	3
1	4	00-10	7	4
1	5	00-03	6	4
1	5	00-04	7	5
1	5	00-07	5	8
1	5	00-09	7	3
1	6	00-02	4	6
1	6	00-03	6	4
1	6	00-05	9	2
1	6	00-10	7	4
2	1	00-01	5	3
2	1	00-02	4	6
2	1	00-03	6	4
2	1	00-06	4	6
2	2	00-02	4	6
2	2	00-03	6	4
2	2	00-04	7	5
2	2	00-05	9	2
2	3	00-01	5	3
2	3	00-02	4	6
2	3	00-05	9	2
2	3	00-10	7	4
2	4	00-03	6	4
2	4	00-05	9	2
2	4	00-09	7	3
2	4	00-10	7	4
2	5	00-03	6	4
2	5	00-04	7	5
2	5	00-07	5	8
2	5	00-09	7	3
2	6	00-02	4	6
2	6	00-03	6	4
2	6	00-05	9	2
2	6	00-10	7	4
3	1	00-01	5	3
3	1	00-02	4	6
3	1	00-03	6	4
3	1	00-06	4	6
3	2	00-02	4	6
3	2	00-03	6	4
3	2	00-04	7	5
3	2	00-05	9	2
3	3	00-01	5	3
3	3	00-02	4	6
3	3	00-05	9	2
3	3	00-10	7	4
3	4	00-03	6	4
3	4	00-05	9	2
3	4	00-09	7	3
3	4	00-10	7	4
3	5	00-03	6	4
3	5	00-04	7	5
3	5	00-07	5	8
3	5	00-09	7	3
3	6	00-02	4	6
3	6	00-03	6	4
3	6	00-05	9	2
3	6	00-10	7	4

;

run;

data have 2;

input index resps ID X Y;

datalines;

10	6	00-02	4	6
10	6	00-03	6	4
10	6	00-05	9	2
10	6	00-10	7	4

;

run;

proc iml;
use have1 nobs nobs;
read all var {reps};
read all var {x y} into data;
close;
use have2;
read all var {x y} into B;
close;

  
start new_cov(x);
 Xc=x-x[:,];
 c=Xc`*Xc/nrow(x);
 return (c);
finish;   
   
start new_mahalanobis(A,B);
 xDiff=A[:,]-B[:,];  
 cA=new_cov(A);
 cB=new_cov(B);
 pC=(nrow(A)/(nrow(A)+nrow(B)))#cA +
    (nrow(B)/(nrow(A)+nrow(B)))#cB ;
 d=sqrt(xDiff*inv(pC)*xDiff`); 
 return (d);
finish;



start_end=t(loc(t(reps)^={.}||remove(reps,nobs)))||
          t(loc(t(reps)^=remove(reps,1)||{.}));
want=j(nrow(start_end),2,.);
 
want[,1]=reps[start_end[,1]]; 
do i=1 to nrow(start_end);
 idx=start_end[i,1]:start_end[i,2];
 A=data[idx,];
 want[i,2]=new_mahalanobis(A,B);
end;

create want from want[c={reps distance}];
append from want;
close;
quit;

IanWakeling · Posted 11-01-2018 06:10 AM

Syntax like c = { reps distance index } is only setting the column names of the data set that is created by IML. You will also need to declare the matrix 'want' to have a third column to accommodate the index data and write the elements of this third column from within the loop.

SWEETSAS · Posted 11-01-2018 06:49 AM

Thanks Ianwakeling!

I need help with declaring matrix "WANT" to have a third column to accommodate the INDEX variable and writing the third column from within the loop.

IanWakeling · Posted 11-01-2018 07:23 AM

For the declaration with 3 columns:

want=j(nrow(start_end),3,.);

then in the loop there needs to be a statement like:

want[i , 3] = ....

but I am not actually sure about the right hand side as I don't understand where the index data comes from.

SWEETSAS · Posted 11-01-2018 07:53 AM

Thanks! The INDEX data is a column in the original data set. If you take a look at the data set example, INDEX is the first column. In the data set example, observations are within replicates, and replicates are within INDEX. I am thinking that since INDEXES are unique and the replicates are also unique, one could concatenate the INDDEX-REPS and then later separate the two variables.

Distance Compution