Enable PROC DISTANCE to output distances in narrow (sparse) format with a shape=sparse option.
This would make the proc to output distances to a 3 column dataset. With columns: id_from, id_to, distance.
This way it would be also easy to limit the output with an output where= option. Example:
proc distance data=BigPositionalData method=euclid shape=sparse out=distances(where=(distance<=100));
var interval (x y);
With method=sparse the usage of the by statement would not be so restrictive (Currently only the same id-s are allowed in all by groups.). Example:
proc distance data=class method=euclid shape=sparse;
var interval (weight height);
Support for multiple id variables would be also useful.
The output would fit into other proc's input (e.g.: PROC OPTGRAPH).
PROC CLUSTER (and other clustering and kNN algorithms) could also work on this type of distance input, maybe assuming, that if distance between 2 observations is missing, then it is very large.
I tried using the shape=sparse option but it gave me the error as sparse is not in the list of available options
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.