Vectors / Vector Space Model

Hello,

I have an observation with two text fields that contain listings of error keys.

I need to convert these fields into two vectors the have a zero/one indicator for each error key. I then need to be able to perform some basic matrix algebra on the vectors and store the results (a numeric value) in a third picture.

EXAMPLE

Field 1: “112 1454 122 342”

Field 2: “122 1343 32”

Key for Vector Element          112 1454 122 342 1343 32

Field 1 Vector:                          1      1       1    1      0     0

Field 2 Vector:                          0      0       1    0      1     1

This is essentially a numerical application of the Salton Wong and Yang (1975) vector space model.

Does anyone have any code handy to do this, or can anyone point me to resources where I can learn it myself? I've been struggling to find stuff.

Thank you all!

I know nothing about this topic, but a quick goggle search led to a couple of R packages and since SAS now integrates through IML to R you can use these packages from SAS

see

``library(RNewsflow)``

https://cran.r-project.org/web/packages/RNewsflow/vignettes/RNewsflow.html

```You should post it at IML forum since it is about Matrix operation.

data have;
Field1='112 1454 122 342';
Field2='122 1343 32';output;
run;
proc iml;
use have;
close;
n1=countw(field1);
temp1=scan(field1,1:n1);
n2=countw(field2);
temp2=scan(field2,1:n2);

all=union(temp1,temp2);

new_field1=t(element(all,temp1));
new_field2=t(element(all,temp2));
print new_field1[r=all],new_field2[r=all];

quit;

```
@Ksharp

Thanks for the code.

There's one hiccup I can't work out.

If document 1 equals "2 22 4 42"

and document 2 equals "2 4"

The vectors for document 2 ticks 1 for all four codes, not just 2 and 4.

I've modified the code slightly, perhaps I did something but I don't think so:

data have;
Field1='4 2 22 42';
Field2='4 2';output;
run;
proc iml;
use have;
close;
n1=countw(field1);
temp1=scan(field1,1:n1);
n2=countw(field2);
temp2=scan(field2,1:n2);

all=union(temp1,temp2);

vector1=t(element(all,temp1));
vector2=t(element(all,temp2));
v1 = sqrt(sum(vector1));
v2 = sqrt(sum(vector2));
dotproduct = vector1` * vector2 ;
similarity = dotproduct / (v1*v2) ;
print vector1[r=all],vector2[r=all], v1, v2, dotproduct, similarity;

quit;

Any suggestions would be appeciated! Thanks so much!

```What if your data look like the following, what you gonna do ?

If document 1 equals "2 22 4 42 142 "
and document 2 equals "2 4"

```
OK. Assuming I understand what you mean.

``````
data have;
Field1='2 22 4 42';
Field2='2 4';output;
run;
proc iml;
use have;
close;
n1=countw(field1);
temp1=scan(field1,1:n1);
n2=countw(field2);
temp2=scan(field2,1:n2);

all=union(temp1,temp2);

newfield1=j(1,ncol(all));
newfield2=j(1,ncol(all));

do j=1 to ncol(all);
temp=all[j];
t=substr(temp,1:length(temp),1);
newfield1[j]=all(element(t,temp1));
newfield2[j]=all(element(t,temp2));
end;

want=newfield1//newfield2;
mattrib want r={newfield1 newfield2} c=all l='';
print want;
quit;
``````
