Solved: Re: distances between observations within groups

ciro · Posted 01-26-2023 05:12 AM

Hi,

I would like to calculate distances between all observations within groups.

The result should be a dataset in which for each observation there are as many records as are the observations within the group, along with the distances.

My best attempt to realize this is as follows:

ods select none;
proc modeclus data=have k=4 Neighbor ;
by group;
var x1 x2 x3 x4;
id id;
ods output Neighbor=Neighbor;
run;

data want(drop=id_old);
set Neighbor(rename=(id=id_old));
by group;
retain id;
if not missing(id_old) then id=id_old;
run;

But the main problem with this solution seems to be that I cannot choose a distance metric. Or can I?

On the other side with proc distance I can choose among many metrics, but the result is a matrix, as follows:

proc distance data=have out=dist method=euclid SHAPE=SQUARE;
by group;
copy id;
var interval(x1 x2 x3 x4);
run;

I know I can match the dataset with itself and then calculate any distance but I wonder if there are other solutions.

I would appreciate suggestions on the possibility to choose a distance metric from proc modeclus and/or how the resulting dataset of proc distance has to be manipulated to obtain the shape I need.

And/or any other solution.

Than you very much in advance.

sbxkoenk · Posted 01-28-2023 04:50 PM

Hello,

You can PROC COMPARE your output dataset

with "work.dist_trp_match" to check whether your program is right.

proc sort data=sashelp.class out=class; by sex name; run;

data class;
 set class;
 retain id2;
 by sex name; 
 if first.sex then do; id2=0; end;
 id2=id2+1;
run;

proc distance data=class out=dist method=euclid SHAPE=SQUARE;
 by sex;
 copy name id2;
 var interval(age height weight);
run;

data dist_trp(drop=dist1-dist10);
 set dist;
 by sex;
 array dist{10} dist1-dist10;
 do Person_withinSameSex = 1 to dim(dist);
  distance_euclid=dist(Person_withinSameSex);
  output;
 end;
run;

PROC SQL noprint;
 create table dist_trp_match as
 select aa.* , bb.name as MatchName
 from   dist_trp(where=(distance_euclid is not missing))    as aa
      , dist(where=(Name is not missing) keep=sex name id2) as bb
 where aa.sex = bb.sex AND aa.Person_withinSameSex=bb.id2 ;
QUIT;
/* end of program */

Koen

View solution in original post

sbxkoenk · Posted 01-26-2023 12:33 PM

Here's one solution :

proc sort data=sashelp.class out=class; by sex name; run;

proc distance data=class out=dist method=euclid SHAPE=SQUARE;
 by sex;
 copy name;
 var interval(age height weight);
run;

data dist_trp(drop=dist1-dist10);
 set dist;
 by sex;
 array dist{10} dist1-dist10;
 do Person_withinSameSex = 1 to dim(dist);
  distance_euclid=dist(Person_withinSameSex);
  output;
 end;
run;
/* end of program */

Koen

sbxkoenk · Posted 01-26-2023 12:38 PM

On top of the previous reply just above ...

The MODECLUS procedure clusters observations in a SAS data set by using any of several algorithms based on nonparametric density estimates. No Euclidean distance metric is possible.

Here are the methods MODECLUS can use :

SAS/STAT 15.2 User's Guide
The MODECLUS Procedure
Clustering Methods
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_modeclus_details02.htm

Koen

ciro · Posted 01-27-2023 07:21 AM

thank you very much.

the only thing missing is that I need to know which is mached to any id, but I can obtain that

with a little coding like this.

is it correct?

proc sort data=sashelp.class out=class; by sex name; run;

proc distance data=class out=dist method=euclid SHAPE=SQUARE;
by sex;
copy name;
var interval(age height weight);
run;

data dist;
set dist;
by sex;
retain id2;
if first.sex then id2=1;
else id2+1;
run;

data dist_trp(drop=dist1-dist10);
*data dist_trp;
set dist;
by sex;
array dist{10} dist1-dist10;
do Person_withinSameSex = 1 to dim(dist);
distance_euclid=dist(Person_withinSameSex);
output;
end;
run;

proc sql;
create table dist_trp as select a.*,b.name as name_matched
from dist_trp as a
left join
dist as b
on a.sex=b.sex and a.Person_withinSameSex=b.id2
order by sex,id2,Person_withinSameSex;
quit;

sbxkoenk · Posted 01-28-2023 04:50 PM

Hello,

You can PROC COMPARE your output dataset

with "work.dist_trp_match" to check whether your program is right.

proc sort data=sashelp.class out=class; by sex name; run;

data class;
 set class;
 retain id2;
 by sex name; 
 if first.sex then do; id2=0; end;
 id2=id2+1;
run;

proc distance data=class out=dist method=euclid SHAPE=SQUARE;
 by sex;
 copy name id2;
 var interval(age height weight);
run;

data dist_trp(drop=dist1-dist10);
 set dist;
 by sex;
 array dist{10} dist1-dist10;
 do Person_withinSameSex = 1 to dim(dist);
  distance_euclid=dist(Person_withinSameSex);
  output;
 end;
run;

PROC SQL noprint;
 create table dist_trp_match as
 select aa.* , bb.name as MatchName
 from   dist_trp(where=(distance_euclid is not missing))    as aa
      , dist(where=(Name is not missing) keep=sex name id2) as bb
 where aa.sex = bb.sex AND aa.Person_withinSameSex=bb.id2 ;
QUIT;
/* end of program */

Koen

ciro · Posted 01-30-2023 11:48 AM

Thank You Koen,

I marked my second message as solution by mistake, since I wanted to mark yours as solution. apologies.

I don't know how to undo the marking. I wrote to sas to ask the unmarking.

Let's see if they respond.