I do Cosine Similarity with PROC DISTANCE, and it always returns a solution (with missing values when it does not apply). As Mahalanobis distance is not an option in PROC DISTANCE, I'm using Proc IML following your posts. But I get the error of "Matrix should be non-singular". I don't get this error when using PROC DISTANCE for cosine similarity using the same dataset. I have to compute the Cosine Similarity and Mahalanobis Distance for super large datasets, and I don't know how to avoid that error, or to force it. Thanks in advance for your help. Example of what I'm trying: data sample;
infile datalines;
input gvkey:8. s1:8. s2:8. s3:8. s4:8. s5:8. s6:8. s7:8. s8:8. s9:8. s10:8.;
datalines;
1000 0 0 0 0 0 0 0 0 0 8.33
1004 0 0 0 0 0 0 0 0 22.22 0
1010 0 0 0 0 0 0 0 0 0 0
1012 0 0 0 0 0 0 0 0 4.54 0
1013 0.16 0 0 0 0 0 0 0 0 0.31
1016 0 0 0 0 0 0 0 0 0 0
;;;;
run;
* Cosine similarity (convert gvkey to string for proc distance);
data sample1(drop=gvkey rename=(firm=gvkey));
retain firm;
set sample;
firm = put(gvkey,best8.);
run;
* Proc distance for Cosine Similarity;
proc distance data=sample1 out=Cos method=COSINE shape=square replace;
var ratio(s1--s10);
id gvkey;
run;
proc print data=cos (obs=10); run;
proc iml;
use sample1;
read all var _NUM_ into x[colname=nNames];
print x;
maha = mahalanobis(x, x);
print maha;
quit; Output for print data=cos (using proc distance) Error from Proc IML:
... View more