@Ksharp
Thanks for the code.
There's one hiccup I can't work out.
If document 1 equals "2 22 4 42"
and document 2 equals "2 4"
The vectors for document 2 ticks 1 for all four codes, not just 2 and 4.
I've modified the code slightly, perhaps I did something but I don't think so:
data have; Field1='4 2 22 42'; Field2='4 2';output; run; proc iml; use have; read all var {field1 field2}; close; n1=countw(field1); temp1=scan(field1,1:n1); n2=countw(field2); temp2=scan(field2,1:n2);
all=union(temp1,temp2);
vector1=t(element(all,temp1)); vector2=t(element(all,temp2)); v1 = sqrt(sum(vector1)); v2 = sqrt(sum(vector2)); dotproduct = vector1` * vector2 ; similarity = dotproduct / (v1*v2) ; print vector1[r=all],vector2[r=all], v1, v2, dotproduct, similarity;
quit;
Any suggestions would be appeciated! Thanks so much!
... View more