Operations Research topics: SAS/OR,
SAS Optimization, and SAS Simulation Studio

Help with Data Extraction

Reply
N/A
Posts: 0

Help with Data Extraction

I am fairly new and novice to SAS, but I have been asked to complete this task and could use some help...

I have 50 txt files. Each one is a matrix of distances between points. So, for example, I have 5 points. I have 1 through 5 on the top and the side, then the distances between the points. When 2 and 2 line up, I get a 0.

What I need to do, is to find the shortest distance between the points and not include the zero that occurs at 2,2.... BUT there might be a zero that occurs at say 3,2 because these points could be right on top of each other.

Lastly, as mentioned before, I have 50 txt files, each one will have a different number of points in the file some 5, others 150. And I need to find the shortest distance between each point.

I thank you greatly in advance for your help!!!
SAS Employee
Posts: 94

Re: Help with Data Extraction

What SAS products do you have installed?
N/A
Posts: 0

Re: Help with Data Extraction

I am checking, but I have Base SAS 9.1.3 for sure. The other products I have, are likely mostly stats based, products, I am an intern at a federal government research facility.
Is there something specific I would need? Message was edited by: ccorwin
Valued Guide
Posts: 2,175

Re: Help with Data Extraction

assuming the matrix is not the data shape, but the data comes like:
row_number, column_number, distance
this might do for one file [pre]
data fileN ;
length row col dist 8 filename $100 ;
infile "fileN.txt" end=eof filename=filename dsd; /* assuming csv data */
input row col dist @ ;
retain min_dist m_row m_col ;
drop min_dist m_row m_col ;
if row NE col and dist LT min_dist then do;
min_dist= dist ;
m_row = row;
m_col = col;
end;
if eof then put filename= min_dist= 'at row=' m_row ' col=' m_col ;
run;
[/pre]

good luck

PeterC
N/A
Posts: 0

Re: Help with Data Extraction

Sadly, the data was exported in the matrix form,

., 1, 2, 3, 4, 5
1, 0, data, data, data, data
2, data, 0, data, data, data
3, data, data, 0, data, data
4, data, data, data, 0, data
5, data, data, data, data, 0
SAS Employee
Posts: 94

Re: Help with Data Extraction

What exactly do you mean by "find the shortest distance between the points"? do you mean, within each file, to find just one pair of points with the shortest distance between them? Or do you mean to find the closest point to each point within the file? Or are you trying to find the shortest route that includes all the points in each file?
N/A
Posts: 0

Re: Help with Data Extraction

Good question, sorry I was not clear on that, I am looking for the smallest value.
For example...

., 1, 2, 3,
1, 0, 4.3, 5.5
2, 4.3, 0, 0
3, 5.5, 0, 0

So in this case, I want to be able to extract, the number 4.3 (the shortest distance between point 1 and anyother point, 0 (the shortest distance between point 2 and any other point) and 0 again (the shortest distance between point 3 and any other point).

So basically, I want the minimum number in each column or each row, either one works in the format of the data. BUT I need to ignore the zero's that occur at the 1,1 and 2,2 etc.

I hope that helps.

Lastly, here is a list of the products I have available to me...
BASE, STAT, GRAPH, ETS, OR, IML, LAB, QC, Enterprise Guide, Enterprise
Reporter, Access to PC File Formats, Access to ODBC, Access to Oracle,
CONNECT, SHARE, ASSIST, INSIGHT, FSP, Online Tutor. Message was edited by: ccorwin
N/A
Posts: 0

Re: Help with Data Extraction

that is almost solved by the earlier code, first you would need to load the data to similar form, like... [pre]
data dataN( keep= row col dist filename ) ;
length filename filen $100 ;
infile "fileN" truncover filename =filen dsd ;
input col @ ; *load buffer and ignore first "column" ;
filename = filen ;
do until( col =.) ; * get final column number ;
co = col ;
input col @ ;
end;
input ; * release column definition row;
* expect square matrix, so there will be "co=" more rows ;
do ro = 1 to co ;
input row @ ; * row number ;
if row ne ro then do;
put 'DATA Problem: ' row= 'found, but expected ' ro ;
stop ;
end;
do col = 1 to co ;
input dist @ ;
* now could look for min distance ;
output ;
end;
input ; * release the row ;
end;
stop;
run;[/pre]
Having loaded data, the max distance can be delivered like:[pre]
proc sql ;
create table resN as
select a.filename, a.row, a.col, a.dist
from dataN a
join ( select min( dist) as min_d
from dataN
where row ne col
) b
on dist = min_d
;
quit;
[/pre] the payback for the added complexity is that where multiple intersections share shortest distance, you have all to choose from, where as the previous approach provided the result nearest the bottom-right corner.

I don't know of a SAS optimisation routine to deal with this in a more polished manner. Is there one ?


PeterC
Frequent Contributor
Posts: 95

Re: Help with Data Extraction

Peter_c has a very good solution with some error checking. For a SAS novice, this might be a little easier to understand. The first data step is very similar

data work.in;
length filename filen $100 ;
infile "H:\Infile.txt" dlm=',' firstobs=2 truncover filename=filen dsd;
filename = filen ;
Input row :8. @;
col=0;
do until (dist = .);
input dist :8. @;
col+1;
if dist = . then return;
if row ne col then output;
end;
run;

proc means data=in noprint nway;
class filename col;
var dist;
output out=work.mindist (drop=_type_ _freq_) min=min;
run;

As Peter_c pointed out, this gets you the minimum distance, but doesn't tell you which pair (or pairs) meet this minimum. This next code accomplishes that.

proc sort data=work.in; by filename col row; run;

data work.out;
merge work.in work.mindist;
by filename col;
if min=dist then output;
run;

If you have a lot of these tables or some very big tables, you might want to try both and see which runs quicker.
N/A
Posts: 0

Re: Help with Data Extraction

For some reason, it is only reading the first 19 distances and then in the 20th, it puts 19.

Sorry to sound stupid, but why would it do that?
Thanks!!!! These are all great!

Thank you all so very much! I have been teaching myself SAS, so this is a great opportunity for me to learn some more advance skills, at least to me. Message was edited by: ccorwin
SAS Super FREQ
Posts: 8,775

Re: Help with Data Extraction

Hi:
I came up with a different solution. I only went down the rows finding the min distance, because the data looks very regular and given the data structure, it doesn't look to me like you needed to go down the columns (but I don't do matrices -- so I don't understand all the implications of not going down the columns). I only made a one-dimensional array.

cynthia
[pre]
** make some data;
** get rid of 1st row because variable numbers represent the column order;
data distdata;
infile datalines dsd;
input rownum dist1 dist2 dist3;
if rownum = . then delete;
return;
datalines;
., 1, 2, 3
1, 0, 4.3, 5.5
2, 4.3, 0, 0
3, 5.5, 0, 0
;
run;

ods listing;
proc print data=distdata;
run;

data mindistance;
set distdata;
array ds dist1-dist3;
array newds newds1-newds3;
array loc $ loc1-loc3;

** newds will have 1,1 and 2,2 and 3,3 etc. set to;
** missing so they do not get used by MIN function.;
** loc1, loc2, loc3 is the matrix location, 1,2 or 1,3 etc;
** use dim(ds) as the stopping point of the do loop so I do not have;
** to hard code an upper boundary;
do i = 1 to dim(ds);
newds(i) = ds(i);
loc(i) = cats(put(rownum,2.0),',',put(i,2.0));
if newds(i) = 0 and i = rownum
then newds(i) = .;
end;

** find the min distance on the row ignoring 1,1 2,2 and 3,3;
min_distance = min(of newds1-newds3);

** now find the location of the min distance;
** again, in newds1, newds2, newds3 there will NOT be a match;
** to 1,1 or 2,2 or 3,3 etc -- so loc_min(i) is the location of the min distance;
do i = 1 to dim(newds);
if min_distance = newds(i) then
loc_min = loc(i);
end;
run;

options missing = . nocenter;
ods listing;
proc print data=mindistance;
var dist1 dist2 dist3 min_distance loc_min;
title "Min Distance For the Rows in the TABLE";
run;
[/pre]
Super Contributor
Posts: 260

Re: Help with Data Extraction

Since you have access to SAS/IML, you can use the matrix language that it contains. Here is an example of finding MIN and MIN2 values over your sample matrix. The beggining is just about getting the data into SAS, so just use the INFILE ... INPUT statements proposed earlier.
[pre]
DATA work.distance ;
INFILE CARDS DLM="," ;
INPUT col1-col3 ;
CARDS ;
0, 4.3, 5.5
4.3, 0, 0
5.5, 0, 0
;
RUN ;
PROC IML ;
USE work.distance ;
READ ALL INTO distMatrix ;
CLOSE work.distance ;
DO row = 1 TO NROW(distMatrix) ;
DO col = 1 TO row ;
distMatrix[row, col] = . ;
END ;
END ;
minDistance = MIN(distMatrix) ;
PRINT minDistance ;
DO row = 1 TO NROW(distMatrix) ;
DO col = 1 TO NROW(distMatrix) ;
IF distMatrix[row, col] = minDistance THEN distMatrix[row, col] = . ;
END ;
END ;
minDistance2 = MIN(distMatrix) ;
PRINT minDistance2 ;
QUIT ;
[/pre]
Then you may use so macro-looping around the whole program to extend it to your 50 text files.
Regards
Olivier
Ask a Question
Discussion stats
  • 11 replies
  • 456 views
  • 0 likes
  • 6 in conversation