Hello,
I have an observation with two text fields that contain listings of error keys.
I need to convert these fields into two vectors the have a zero/one indicator for each error key. I then need to be able to perform some basic matrix algebra on the vectors and store the results (a numeric value) in a third picture.
EXAMPLE
Field 1: “112 1454 122 342”
Field 2: “122 1343 32”
Key for Vector Element 112 1454 122 342 1343 32
Field 1 Vector: 1 1 1 1 0 0
Field 2 Vector: 0 0 1 0 1 1
This is essentially a numerical application of the Salton Wong and Yang (1975) vector space model.
Does anyone have any code handy to do this, or can anyone point me to resources where I can learn it myself? I've been struggling to find stuff.
Thank you all!
I know nothing about this topic, but a quick goggle search led to a couple of R packages and since SAS now integrates through IML to R you can use these packages from SAS
see
library(RNewsflow)
https://cran.r-project.org/web/packages/RNewsflow/vignettes/RNewsflow.html
You should post it at IML forum since it is about Matrix operation. data have; Field1='112 1454 122 342'; Field2='122 1343 32';output; run; proc iml; use have; read all var {field1 field2}; close; n1=countw(field1); temp1=scan(field1,1:n1); n2=countw(field2); temp2=scan(field2,1:n2); all=union(temp1,temp2); new_field1=t(element(all,temp1)); new_field2=t(element(all,temp2)); print new_field1[r=all],new_field2[r=all]; quit;
Thanks for the code.
There's one hiccup I can't work out.
If document 1 equals "2 22 4 42"
and document 2 equals "2 4"
The vectors for document 2 ticks 1 for all four codes, not just 2 and 4.
I've modified the code slightly, perhaps I did something but I don't think so:
data have;
Field1='4 2 22 42';
Field2='4 2';output;
run;
proc iml;
use have;
read all var {field1 field2};
close;
n1=countw(field1);
temp1=scan(field1,1:n1);
n2=countw(field2);
temp2=scan(field2,1:n2);
all=union(temp1,temp2);
vector1=t(element(all,temp1));
vector2=t(element(all,temp2));
v1 = sqrt(sum(vector1));
v2 = sqrt(sum(vector2));
dotproduct = vector1` * vector2 ;
similarity = dotproduct / (v1*v2) ;
print vector1[r=all],vector2[r=all], v1, v2, dotproduct, similarity;
quit;
Any suggestions would be appeciated! Thanks so much!
What if your data look like the following, what you gonna do ? If document 1 equals "2 22 4 42 142 " and document 2 equals "2 4"
OK. Assuming I understand what you mean.
data have;
Field1='2 22 4 42';
Field2='2 4';output;
run;
proc iml;
use have;
read all var {field1 field2};
close;
n1=countw(field1);
temp1=scan(field1,1:n1);
n2=countw(field2);
temp2=scan(field2,1:n2);
all=union(temp1,temp2);
newfield1=j(1,ncol(all));
newfield2=j(1,ncol(all));
do j=1 to ncol(all);
temp=all[j];
t=substr(temp,1:length(temp),1);
newfield1[j]=all(element(t,temp1));
newfield2[j]=all(element(t,temp2));
end;
want=newfield1//newfield2;
mattrib want r={newfield1 newfield2} c=all l='';
print want;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.