Re: Create new table using a old table

deleted_user · Posted 10-23-2010 03:04 PM

Hi Guys,
I am new to SAS programming, I am stuck with the following problem for some time, I will really appricate if you can give me some pointers for solving this.

I have a table named coordinates
ID X Y
-- -- --
S1 1 2
S2 1 3
S3 2 3
S4 5 6

I need to make one another table out of this table such that it should contain the following
ID_S ID_D DIST
------ ------ ------
S1 S2 sqrt((1-1)^2 + (3-2)^2)
S1 S3 ...
S1 S4 ...
S2 S1 ...
S2 S3 ...
S2 S4 ...
S3 S1 ...
S3 S2 ...
S3 S4 ...
S4 S1 ...
S4 S2 ...
S4 S3 ...

Regards
HunT Message was edited by: HunT_GuY

Cynthia_sas · Posted 10-23-2010 04:37 PM

Hi:
It looks like you are making the new table out of all possible combinations of the values for the ID variable from the original COORDINATES table??? You could do this with PROC SQL. When you do a join (or a self-join) like this in PROC SQL, you have to know how you are going to name the new variables (such as ID_S and ID_D) that you want to create. Generally, when you are creating all possible combinations of values (such as values for ID), this kind of join is called a Cartesian product. In your case, you do NOT seem to want ID values of S1 combined with S1 or ID values of S2 combined with S2, etc -- so you would have to use a WHERE clause in your PROC SQL to prevent this combination of IDs from being output to the new table.

What I do NOT understand is
1) X and Y variables are shown in the coordinates file -- but they are not shown as variables in the new table. What happens to X and Y in the creation of the new table?
2) You show a new variable called DIST with ... for each observation in the new table after the first observation. Does this mean that DIST is a fixed value and uses the same formula for every observation? Or, do you mean to imply that the X and Y values from ID_S and ID_D are used in the calculation of DIST?

For example, if you had data like this (which shows the possible combinations of S1 and S2 -AND- also shows the X and Y values for ID_S and ID_D) all with different column names,
[pre]
id_s s_x s_y id_d d_x d_y
S1 1 2 S2 1 3
S1 1 2 S3 2 3
S1 1 2 S4 5 6
S2 1 3 S1 1 2
S2 1 3 S3 2 3
S2 1 3 S4 5 6
[/pre]

...would the various X and Y values be used in the SQRT-based formula for every observation???

Can you explain a bit more about how the DIST column gets calculated for each observation and what your formula represents???

cynthia

deleted_user · Posted 10-24-2010 04:17 AM

Hi cynthia,
Thanks for the reply, basically I am trying to find the distance between two points, here id_s -> represents the source point, id_d -> represents the destination point and dist -> represents the distance between these two points. Using the formula distance between x1, y1 and x2,y2 = sqrt ((x2-x1)^2 + (y2-y1)^2). Therefore the values of x,y will be used for every observation for calculation of distance. I hope I was able to clarity your concern.
Thanks
HunT

ArtC · Posted 10-24-2010 09:31 PM

Using the DATA step here is a quick program that is very scalable; But first as an aside this,Euclidean distance, is only one of the simpler measures of distance available in SAS (search for 'proximity measures' and look into PROC DISTANCE). Some of the cluster analysis procedures can also calculate this as a distance matrix. Depends on your usage.

[pre]data coords;

input id $2. x y;

datalines;
S1 1 2
S2 1 3
S3 2 3
S4 5 6
run;

* Build distances;
data dist(keep= idx idy x1 y1 x2 y2 dist);
set coords nobs=nobs end=eof;

array xx {100} _temporary_;
array yy {100} _temporary_;
array idlist {100} $2 _temporary_;

* Fill the arrays;
xx(_n_) = x;
yy(_n_) = y;
idlist(_n_) = id;

* write the obs from the array;
if eof then do i = 1 to nobs-1;
do j= i+1 to nobs;
* Assign values to each variable;
idx = idlist(i);
idy = idlist(j);
x1 = xx(i);
y1 = yy(i);
x2 = xx(j);
y2 = yy(j);
dist=sqrt((x1-x2)**2 + (y1-y2)**2);
output dist;
end;
end;
run;

proc print data=dist;
run;
[/pre]

deleted_user · Posted 10-25-2010 02:12 AM

Thanks ArtC. 🙂

chang_y_chung_hotmail_com · Posted 10-25-2010 02:52 PM

If you have sas/stat licensed, then the calculation takes only a few lines. Reshaping from the matrix to an edgelist is left as homework.
[pre]
data one;
input id $ x y @@;
cards;
S1 1 2 S2 1 3 S3 2 3 S4 5 6
;
run;

proc distance data=one out=dist shape=square method=euclid;
var ratio(x y);
id id;
run;

/* check */
proc print data=dist noobs;
format s: f6.4;
run;
/* on lst
id S1 S2 S3 S4

S1 0.0000 1.0000 1.4142 5.6569
S2 1.0000 0.0000 1.0000 5.0000
S3 1.4142 1.0000 0.0000 4.2426
S4 5.6569 5.0000 4.2426 0.0000
*/
[/pre]

If you don't have the luxury of having sas/stat licensed, then sql takes only a few lines more:
[pre]
data one;
input id $ x y @@;
cards;
S1 1 2 S2 1 3 S3 2 3 S4 5 6
;
run;

proc sql;
create table dist as
select d1.id as id1, d2.id as id2
, sqrt((d1.x-d2.x)**2 + (d1.y-d2.y)**2) as dist
from one as d1, one as d2
where d1.id not = d2.id
order by d1.id, d2.id;

/* check */
select id1, id2, putn(dist,"6.4") as dist from dist;
quit;
/* on lst
id1 id2 dist
--------------------------
S1 S2 1.0000
S1 S3 1.4142
S1 S4 5.6569
S2 S1 1.0000
S2 S3 1.0000
S2 S4 5.0000
S3 S1 1.4142
S3 S2 1.0000
S3 S4 4.2426
S4 S1 5.6569
S4 S2 5.0000
S4 S3 4.2426
*/
[/pre]

Create new table using a old table