turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Create new table using a old table

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-23-2010 03:04 PM

Hi Guys,

I am new to SAS programming, I am stuck with the following problem for some time, I will really appricate if you can give me some pointers for solving this.

I have a table named coordinates

ID X Y

-- -- --

S1 1 2

S2 1 3

S3 2 3

S4 5 6

I need to make one another table out of this table such that it should contain the following

ID_S ID_D DIST

------ ------ ------

S1 S2 sqrt((1-1)^2 + (3-2)^2)

S1 S3 ...

S1 S4 ...

S2 S1 ...

S2 S3 ...

S2 S4 ...

S3 S1 ...

S3 S2 ...

S3 S4 ...

S4 S1 ...

S4 S2 ...

S4 S3 ...

Regards

HunT Message was edited by: HunT_GuY

I am new to SAS programming, I am stuck with the following problem for some time, I will really appricate if you can give me some pointers for solving this.

I have a table named coordinates

ID X Y

-- -- --

S1 1 2

S2 1 3

S3 2 3

S4 5 6

I need to make one another table out of this table such that it should contain the following

ID_S ID_D DIST

------ ------ ------

S1 S2 sqrt((1-1)^2 + (3-2)^2)

S1 S3 ...

S1 S4 ...

S2 S1 ...

S2 S3 ...

S2 S4 ...

S3 S1 ...

S3 S2 ...

S3 S4 ...

S4 S1 ...

S4 S2 ...

S4 S3 ...

Regards

HunT Message was edited by: HunT_GuY

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

10-23-2010 04:37 PM

Hi:

It looks like you are making the new table out of all possible combinations of the values for the ID variable from the original COORDINATES table??? You could do this with PROC SQL. When you do a join (or a self-join) like this in PROC SQL, you have to know how you are going to name the new variables (such as ID_S and ID_D) that you want to create. Generally, when you are creating all possible combinations of values (such as values for ID), this kind of join is called a Cartesian product. In your case, you do NOT seem to want ID values of S1 combined with S1 or ID values of S2 combined with S2, etc -- so you would have to use a WHERE clause in your PROC SQL to prevent this combination of IDs from being output to the new table.

What I do NOT understand is

1) X and Y variables are shown in the coordinates file -- but they are not shown as variables in the new table. What happens to X and Y in the creation of the new table?

2) You show a new variable called DIST with ... for each observation in the new table after the first observation. Does this mean that DIST is a fixed value and uses the same formula for every observation? Or, do you mean to imply that the X and Y values from ID_S and ID_D are used in the calculation of DIST?

For example, if you had data like this (which shows the possible combinations of S1 and S2 -AND- also shows the X and Y values for ID_S and ID_D) all with different column names,

[pre]

id_s s_x s_y id_d d_x d_y

S1 1 2 S2 1 3

S1 1 2 S3 2 3

S1 1 2 S4 5 6

S2 1 3 S1 1 2

S2 1 3 S3 2 3

S2 1 3 S4 5 6

[/pre]

...would the various X and Y values be used in the SQRT-based formula for every observation???

Can you explain a bit more about how the DIST column gets calculated for each observation and what your formula represents???

cynthia

It looks like you are making the new table out of all possible combinations of the values for the ID variable from the original COORDINATES table??? You could do this with PROC SQL. When you do a join (or a self-join) like this in PROC SQL, you have to know how you are going to name the new variables (such as ID_S and ID_D) that you want to create. Generally, when you are creating all possible combinations of values (such as values for ID), this kind of join is called a Cartesian product. In your case, you do NOT seem to want ID values of S1 combined with S1 or ID values of S2 combined with S2, etc -- so you would have to use a WHERE clause in your PROC SQL to prevent this combination of IDs from being output to the new table.

What I do NOT understand is

1) X and Y variables are shown in the coordinates file -- but they are not shown as variables in the new table. What happens to X and Y in the creation of the new table?

2) You show a new variable called DIST with ... for each observation in the new table after the first observation. Does this mean that DIST is a fixed value and uses the same formula for every observation? Or, do you mean to imply that the X and Y values from ID_S and ID_D are used in the calculation of DIST?

For example, if you had data like this (which shows the possible combinations of S1 and S2 -AND- also shows the X and Y values for ID_S and ID_D) all with different column names,

[pre]

id_s s_x s_y id_d d_x d_y

S1 1 2 S2 1 3

S1 1 2 S3 2 3

S1 1 2 S4 5 6

S2 1 3 S1 1 2

S2 1 3 S3 2 3

S2 1 3 S4 5 6

[/pre]

...would the various X and Y values be used in the SQRT-based formula for every observation???

Can you explain a bit more about how the DIST column gets calculated for each observation and what your formula represents???

cynthia

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Cynthia_sas

10-24-2010 04:17 AM

Hi cynthia,

Thanks for the reply, basically I am trying to find the distance between two points, here id_s -> represents the source point, id_d -> represents the destination point and dist -> represents the distance between these two points. Using the formula distance between x1, y1 and x2,y2 = sqrt ((x2-x1)^2 + (y2-y1)^2). Therefore the values of x,y will be used for every observation for calculation of distance. I hope I was able to clarity your concern.

Thanks

HunT

Thanks for the reply, basically I am trying to find the distance between two points, here id_s -> represents the source point, id_d -> represents the destination point and dist -> represents the distance between these two points. Using the formula distance between x1, y1 and x2,y2 = sqrt ((x2-x1)^2 + (y2-y1)^2). Therefore the values of x,y will be used for every observation for calculation of distance. I hope I was able to clarity your concern.

Thanks

HunT

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

10-24-2010 09:31 PM

Using the DATA step here is a quick program that is very scalable; But first as an aside this,Euclidean distance, is only one of the simpler measures of distance available in SAS (search for 'proximity measures' and look into PROC DISTANCE). Some of the cluster analysis procedures can also calculate this as a distance matrix. Depends on your usage.

[pre]data coords;

input id $2. x y;

datalines;

S1 1 2

S2 1 3

S3 2 3

S4 5 6

run;

* Build distances;

data dist(keep= idx idy x1 y1 x2 y2 dist);

set coords nobs=nobs end=eof;

array xx {100} _temporary_;

array yy {100} _temporary_;

array idlist {100} $2 _temporary_;

* Fill the arrays;

xx(_n_) = x;

yy(_n_) = y;

idlist(_n_) = id;

* write the obs from the array;

if eof then do i = 1 to nobs-1;

do j= i+1 to nobs;

* Assign values to each variable;

idx = idlist(i);

idy = idlist(j);

x1 = xx(i);

y1 = yy(i);

x2 = xx(j);

y2 = yy(j);

dist=sqrt((x1-x2)**2 + (y1-y2)**2);

output dist;

end;

end;

run;

proc print data=dist;

run;

[/pre]

[pre]data coords;

input id $2. x y;

datalines;

S1 1 2

S2 1 3

S3 2 3

S4 5 6

run;

* Build distances;

data dist(keep= idx idy x1 y1 x2 y2 dist);

set coords nobs=nobs end=eof;

array xx {100} _temporary_;

array yy {100} _temporary_;

array idlist {100} $2 _temporary_;

* Fill the arrays;

xx(_n_) = x;

yy(_n_) = y;

idlist(_n_) = id;

* write the obs from the array;

if eof then do i = 1 to nobs-1;

do j= i+1 to nobs;

* Assign values to each variable;

idx = idlist(i);

idy = idlist(j);

x1 = xx(i);

y1 = yy(i);

x2 = xx(j);

y2 = yy(j);

dist=sqrt((x1-x2)**2 + (y1-y2)**2);

output dist;

end;

end;

run;

proc print data=dist;

run;

[/pre]

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ArtC

10-25-2010 02:12 AM

Thanks ArtC. :-)

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

10-25-2010 02:52 PM

If you have sas/stat licensed, then the calculation takes only a few lines. Reshaping from the matrix to an edgelist is left as homework.

[pre]

data one;

input id $ x y @@;

cards;

S1 1 2 S2 1 3 S3 2 3 S4 5 6

;

run;

proc distance data=one out=dist shape=square method=euclid;

var ratio(x y);

id id;

run;

/* check */

proc print data=dist noobs;

format s: f6.4;

run;

/* on lst

id S1 S2 S3 S4

S1 0.0000 1.0000 1.4142 5.6569

S2 1.0000 0.0000 1.0000 5.0000

S3 1.4142 1.0000 0.0000 4.2426

S4 5.6569 5.0000 4.2426 0.0000

*/

[/pre]

If you don't have the luxury of having sas/stat licensed, then sql takes only a few lines more:

[pre]

data one;

input id $ x y @@;

cards;

S1 1 2 S2 1 3 S3 2 3 S4 5 6

;

run;

proc sql;

create table dist as

select d1.id as id1, d2.id as id2

, sqrt((d1.x-d2.x)**2 + (d1.y-d2.y)**2) as dist

from one as d1, one as d2

where d1.id not = d2.id

order by d1.id, d2.id;

/* check */

select id1, id2, putn(dist,"6.4") as dist from dist;

quit;

/* on lst

id1 id2 dist

--------------------------

S1 S2 1.0000

S1 S3 1.4142

S1 S4 5.6569

S2 S1 1.0000

S2 S3 1.0000

S2 S4 5.0000

S3 S1 1.4142

S3 S2 1.0000

S3 S4 4.2426

S4 S1 5.6569

S4 S2 5.0000

S4 S3 4.2426

*/

[/pre]

[pre]

data one;

input id $ x y @@;

cards;

S1 1 2 S2 1 3 S3 2 3 S4 5 6

;

run;

proc distance data=one out=dist shape=square method=euclid;

var ratio(x y);

id id;

run;

/* check */

proc print data=dist noobs;

format s: f6.4;

run;

/* on lst

id S1 S2 S3 S4

S1 0.0000 1.0000 1.4142 5.6569

S2 1.0000 0.0000 1.0000 5.0000

S3 1.4142 1.0000 0.0000 4.2426

S4 5.6569 5.0000 4.2426 0.0000

*/

[/pre]

If you don't have the luxury of having sas/stat licensed, then sql takes only a few lines more:

[pre]

data one;

input id $ x y @@;

cards;

S1 1 2 S2 1 3 S3 2 3 S4 5 6

;

run;

proc sql;

create table dist as

select d1.id as id1, d2.id as id2

, sqrt((d1.x-d2.x)**2 + (d1.y-d2.y)**2) as dist

from one as d1, one as d2

where d1.id not = d2.id

order by d1.id, d2.id;

/* check */

select id1, id2, putn(dist,"6.4") as dist from dist;

quit;

/* on lst

id1 id2 dist

--------------------------

S1 S2 1.0000

S1 S3 1.4142

S1 S4 5.6569

S2 S1 1.0000

S2 S3 1.0000

S2 S4 5.0000

S3 S1 1.4142

S3 S2 1.0000

S3 S4 4.2426

S4 S1 5.6569

S4 S2 5.0000

S4 S3 4.2426

*/

[/pre]