Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 01-31-2008 12:59 PM
(1551 views)

I am fairly new and novice to SAS, but I have been asked to complete this task and could use some help...

I have 50 txt files. Each one is a matrix of distances between points. So, for example, I have 5 points. I have 1 through 5 on the top and the side, then the distances between the points. When 2 and 2 line up, I get a 0.

What I need to do, is to find the shortest distance between the points and not include the zero that occurs at 2,2.... BUT there might be a zero that occurs at say 3,2 because these points could be right on top of each other.

Lastly, as mentioned before, I have 50 txt files, each one will have a different number of points in the file some 5, others 150. And I need to find the shortest distance between each point.

I thank you greatly in advance for your help!!!

I have 50 txt files. Each one is a matrix of distances between points. So, for example, I have 5 points. I have 1 through 5 on the top and the side, then the distances between the points. When 2 and 2 line up, I get a 0.

What I need to do, is to find the shortest distance between the points and not include the zero that occurs at 2,2.... BUT there might be a zero that occurs at say 3,2 because these points could be right on top of each other.

Lastly, as mentioned before, I have 50 txt files, each one will have a different number of points in the file some 5, others 150. And I need to find the shortest distance between each point.

I thank you greatly in advance for your help!!!

11 REPLIES 11

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What SAS products do you have installed?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I am checking, but I have Base SAS 9.1.3 for sure. The other products I have, are likely mostly stats based, products, I am an intern at a federal government research facility.

Is there something specific I would need? Message was edited by: ccorwin

Is there something specific I would need? Message was edited by: ccorwin

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

assuming the matrix is not the data shape, but the data comes like:

row_number, column_number, distance

this might do for one file [pre]

data fileN ;

length row col dist 8 filename $100 ;

infile "fileN.txt" end=eof filename=filename dsd; /* assuming csv data */

input row col dist @ ;

retain min_dist m_row m_col ;

drop min_dist m_row m_col ;

if row NE col and dist LT min_dist then do;

min_dist= dist ;

m_row = row;

m_col = col;

end;

if eof then put filename= min_dist= 'at row=' m_row ' col=' m_col ;

run;

[/pre]

good luck

PeterC

row_number, column_number, distance

this might do for one file [pre]

data fileN ;

length row col dist 8 filename $100 ;

infile "fileN.txt" end=eof filename=filename dsd; /* assuming csv data */

input row col dist @ ;

retain min_dist m_row m_col ;

drop min_dist m_row m_col ;

if row NE col and dist LT min_dist then do;

min_dist= dist ;

m_row = row;

m_col = col;

end;

if eof then put filename= min_dist= 'at row=' m_row ' col=' m_col ;

run;

[/pre]

good luck

PeterC

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Sadly, the data was exported in the matrix form,

., 1, 2, 3, 4, 5

1, 0, data, data, data, data

2, data, 0, data, data, data

3, data, data, 0, data, data

4, data, data, data, 0, data

5, data, data, data, data, 0

., 1, 2, 3, 4, 5

1, 0, data, data, data, data

2, data, 0, data, data, data

3, data, data, 0, data, data

4, data, data, data, 0, data

5, data, data, data, data, 0

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What exactly do you mean by "find the shortest distance between the points"? do you mean, within each file, to find just one pair of points with the shortest distance between them? Or do you mean to find the closest point to each point within the file? Or are you trying to find the shortest route that includes all the points in each file?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Good question, sorry I was not clear on that, I am looking for the smallest value.

For example...

., 1, 2, 3,

1, 0, 4.3, 5.5

2, 4.3, 0, 0

3, 5.5, 0, 0

So in this case, I want to be able to extract, the number 4.3 (the shortest distance between point 1 and anyother point, 0 (the shortest distance between point 2 and any other point) and 0 again (the shortest distance between point 3 and any other point).

So basically, I want the minimum number in each column or each row, either one works in the format of the data. BUT I need to ignore the zero's that occur at the 1,1 and 2,2 etc.

I hope that helps.

Lastly, here is a list of the products I have available to me...

BASE, STAT, GRAPH, ETS, OR, IML, LAB, QC, Enterprise Guide, Enterprise

Reporter, Access to PC File Formats, Access to ODBC, Access to Oracle,

CONNECT, SHARE, ASSIST, INSIGHT, FSP, Online Tutor. Message was edited by: ccorwin

For example...

., 1, 2, 3,

1, 0, 4.3, 5.5

2, 4.3, 0, 0

3, 5.5, 0, 0

So in this case, I want to be able to extract, the number 4.3 (the shortest distance between point 1 and anyother point, 0 (the shortest distance between point 2 and any other point) and 0 again (the shortest distance between point 3 and any other point).

So basically, I want the minimum number in each column or each row, either one works in the format of the data. BUT I need to ignore the zero's that occur at the 1,1 and 2,2 etc.

I hope that helps.

Lastly, here is a list of the products I have available to me...

BASE, STAT, GRAPH, ETS, OR, IML, LAB, QC, Enterprise Guide, Enterprise

Reporter, Access to PC File Formats, Access to ODBC, Access to Oracle,

CONNECT, SHARE, ASSIST, INSIGHT, FSP, Online Tutor. Message was edited by: ccorwin

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

that is almost solved by the earlier code, first you would need to load the data to similar form, like... [pre]

data dataN( keep= row col dist filename ) ;

length filename filen $100 ;

infile "fileN" truncover filename =filen dsd ;

input col @ ; *load buffer and ignore first "column" ;

filename = filen ;

do until( col =.) ; * get final column number ;

co = col ;

input col @ ;

end;

input ; * release column definition row;

* expect square matrix, so there will be "co=" more rows ;

do ro = 1 to co ;

input row @ ; * row number ;

if row ne ro then do;

put 'DATA Problem: ' row= 'found, but expected ' ro ;

stop ;

end;

do col = 1 to co ;

input dist @ ;

* now*could* look for min distance ;

output ;

end;

input ; * release the row ;

end;

stop;

run;[/pre]

Having loaded data, the max distance can be delivered like:[pre]

proc sql ;

create table resN as

select a.filename, a.row, a.col, a.dist

from dataN a

join ( select min( dist) as min_d

from dataN

where row ne col

) b

on dist = min_d

;

quit;

[/pre] the payback for the added complexity is that where multiple intersections share shortest distance, you have all to choose from, where as the previous approach provided the result nearest the bottom-right corner.

I don't know of a SAS optimisation routine to deal with this in a more polished manner. Is there one ?

PeterC

data dataN( keep= row col dist filename ) ;

length filename filen $100 ;

infile "fileN" truncover filename =filen dsd ;

input col @ ; *load buffer and ignore first "column" ;

filename = filen ;

do until( col =.) ; * get final column number ;

co = col ;

input col @ ;

end;

input ; * release column definition row;

* expect square matrix, so there will be "co=" more rows ;

do ro = 1 to co ;

input row @ ; * row number ;

if row ne ro then do;

put 'DATA Problem: ' row= 'found, but expected ' ro ;

stop ;

end;

do col = 1 to co ;

input dist @ ;

* now

output ;

end;

input ; * release the row ;

end;

stop;

run;[/pre]

Having loaded data, the max distance can be delivered like:[pre]

proc sql ;

create table resN as

select a.filename, a.row, a.col, a.dist

from dataN a

join ( select min( dist) as min_d

from dataN

where row ne col

) b

on dist = min_d

;

quit;

[/pre] the payback for the added complexity is that where multiple intersections share shortest distance, you have all to choose from, where as the previous approach provided the result nearest the bottom-right corner.

I don't know of a SAS optimisation routine to deal with this in a more polished manner. Is there one ?

PeterC

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Peter_c has a very good solution with some error checking. For a SAS novice, this might be a little easier to understand. The first data step is very similar

data work.in;

length filename filen $100 ;

infile "H:\Infile.txt" dlm=',' firstobs=2 truncover filename=filen dsd;

filename = filen ;

Input row :8. @;

col=0;

do until (dist = .);

input dist :8. @;

col+1;

if dist = . then return;

if row ne col then output;

end;

run;

proc means data=in noprint nway;

class filename col;

var dist;

output out=work.mindist (drop=_type_ _freq_) min=min;

run;

As Peter_c pointed out, this gets you the minimum distance, but doesn't tell you which pair (or pairs) meet this minimum. This next code accomplishes that.

proc sort data=work.in; by filename col row; run;

data work.out;

merge work.in work.mindist;

by filename col;

if min=dist then output;

run;

If you have a lot of these tables or some very big tables, you might want to try both and see which runs quicker.

data work.in;

length filename filen $100 ;

infile "H:\Infile.txt" dlm=',' firstobs=2 truncover filename=filen dsd;

filename = filen ;

Input row :8. @;

col=0;

do until (dist = .);

input dist :8. @;

col+1;

if dist = . then return;

if row ne col then output;

end;

run;

proc means data=in noprint nway;

class filename col;

var dist;

output out=work.mindist (drop=_type_ _freq_) min=min;

run;

As Peter_c pointed out, this gets you the minimum distance, but doesn't tell you which pair (or pairs) meet this minimum. This next code accomplishes that.

proc sort data=work.in; by filename col row; run;

data work.out;

merge work.in work.mindist;

by filename col;

if min=dist then output;

run;

If you have a lot of these tables or some very big tables, you might want to try both and see which runs quicker.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

For some reason, it is only reading the first 19 distances and then in the 20th, it puts 19.

Sorry to sound stupid, but why would it do that?

Thanks!!!! These are all great!

Thank you all so very much! I have been teaching myself SAS, so this is a great opportunity for me to learn some more advance skills, at least to me. Message was edited by: ccorwin

Sorry to sound stupid, but why would it do that?

Thanks!!!! These are all great!

Thank you all so very much! I have been teaching myself SAS, so this is a great opportunity for me to learn some more advance skills, at least to me. Message was edited by: ccorwin

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi:

I came up with a different solution. I only went down the rows finding the min distance, because the data looks very regular and given the data structure, it doesn't look to me like you needed to go down the columns (but I don't do matrices -- so I don't understand all the implications of not going down the columns). I only made a one-dimensional array.

cynthia

[pre]

** make some data;

** get rid of 1st row because variable numbers represent the column order;

data distdata;

infile datalines dsd;

input rownum dist1 dist2 dist3;

if rownum = . then delete;

return;

datalines;

., 1, 2, 3

1, 0, 4.3, 5.5

2, 4.3, 0, 0

3, 5.5, 0, 0

;

run;

ods listing;

proc print data=distdata;

run;

data mindistance;

set distdata;

array ds dist1-dist3;

array newds newds1-newds3;

array loc $ loc1-loc3;

** newds will have 1,1 and 2,2 and 3,3 etc. set to;

** missing so they do not get used by MIN function.;

** loc1, loc2, loc3 is the matrix location, 1,2 or 1,3 etc;

** use dim(ds) as the stopping point of the do loop so I do not have;

** to hard code an upper boundary;

do i = 1 to dim(ds);

newds(i) = ds(i);

loc(i) = cats(put(rownum,2.0),',',put(i,2.0));

if newds(i) = 0 and i = rownum

then newds(i) = .;

end;

** find the min distance on the row ignoring 1,1 2,2 and 3,3;

min_distance = min(of newds1-newds3);

** now find the location of the min distance;

** again, in newds1, newds2, newds3 there will NOT be a match;

** to 1,1 or 2,2 or 3,3 etc -- so loc_min(i) is the location of the min distance;

do i = 1 to dim(newds);

if min_distance = newds(i) then

loc_min = loc(i);

end;

run;

options missing = . nocenter;

ods listing;

proc print data=mindistance;

var dist1 dist2 dist3 min_distance loc_min;

title "Min Distance For the Rows in the TABLE";

run;

[/pre]

I came up with a different solution. I only went down the rows finding the min distance, because the data looks very regular and given the data structure, it doesn't look to me like you needed to go down the columns (but I don't do matrices -- so I don't understand all the implications of not going down the columns). I only made a one-dimensional array.

cynthia

[pre]

** make some data;

** get rid of 1st row because variable numbers represent the column order;

data distdata;

infile datalines dsd;

input rownum dist1 dist2 dist3;

if rownum = . then delete;

return;

datalines;

., 1, 2, 3

1, 0, 4.3, 5.5

2, 4.3, 0, 0

3, 5.5, 0, 0

;

run;

ods listing;

proc print data=distdata;

run;

data mindistance;

set distdata;

array ds dist1-dist3;

array newds newds1-newds3;

array loc $ loc1-loc3;

** newds will have 1,1 and 2,2 and 3,3 etc. set to;

** missing so they do not get used by MIN function.;

** loc1, loc2, loc3 is the matrix location, 1,2 or 1,3 etc;

** use dim(ds) as the stopping point of the do loop so I do not have;

** to hard code an upper boundary;

do i = 1 to dim(ds);

newds(i) = ds(i);

loc(i) = cats(put(rownum,2.0),',',put(i,2.0));

if newds(i) = 0 and i = rownum

then newds(i) = .;

end;

** find the min distance on the row ignoring 1,1 2,2 and 3,3;

min_distance = min(of newds1-newds3);

** now find the location of the min distance;

** again, in newds1, newds2, newds3 there will NOT be a match;

** to 1,1 or 2,2 or 3,3 etc -- so loc_min(i) is the location of the min distance;

do i = 1 to dim(newds);

if min_distance = newds(i) then

loc_min = loc(i);

end;

run;

options missing = . nocenter;

ods listing;

proc print data=mindistance;

var dist1 dist2 dist3 min_distance loc_min;

title "Min Distance For the Rows in the TABLE";

run;

[/pre]

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Since you have access to SAS/IML, you can use the matrix language that it contains. Here is an example of finding MIN and MIN2 values over your sample matrix. The beggining is just about getting the data into SAS, so just use the INFILE ... INPUT statements proposed earlier.

[pre]

DATA work.distance ;

INFILE CARDS DLM="," ;

INPUT col1-col3 ;

CARDS ;

0, 4.3, 5.5

4.3, 0, 0

5.5, 0, 0

;

RUN ;

PROC IML ;

USE work.distance ;

READ ALL INTO distMatrix ;

CLOSE work.distance ;

DO row = 1 TO NROW(distMatrix) ;

DO col = 1 TO row ;

distMatrix[row, col] = . ;

END ;

END ;

minDistance = MIN(distMatrix) ;

PRINT minDistance ;

DO row = 1 TO NROW(distMatrix) ;

DO col = 1 TO NROW(distMatrix) ;

IF distMatrix[row, col] = minDistance THEN distMatrix[row, col] = . ;

END ;

END ;

minDistance2 = MIN(distMatrix) ;

PRINT minDistance2 ;

QUIT ;

[/pre]

Then you may use so macro-looping around the whole program to extend it to your 50 text files.

Regards

Olivier

[pre]

DATA work.distance ;

INFILE CARDS DLM="," ;

INPUT col1-col3 ;

CARDS ;

0, 4.3, 5.5

4.3, 0, 0

5.5, 0, 0

;

RUN ;

PROC IML ;

USE work.distance ;

READ ALL INTO distMatrix ;

CLOSE work.distance ;

DO row = 1 TO NROW(distMatrix) ;

DO col = 1 TO row ;

distMatrix[row, col] = . ;

END ;

END ;

minDistance = MIN(distMatrix) ;

PRINT minDistance ;

DO row = 1 TO NROW(distMatrix) ;

DO col = 1 TO NROW(distMatrix) ;

IF distMatrix[row, col] = minDistance THEN distMatrix[row, col] = . ;

END ;

END ;

minDistance2 = MIN(distMatrix) ;

PRINT minDistance2 ;

QUIT ;

[/pre]

Then you may use so macro-looping around the whole program to extend it to your 50 text files.

Regards

Olivier

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.