Hello,
I have an observation with two text fields that contain listings of error keys.
I need to convert these fields into two vectors the have a zero/one indicator for each error key. I then need to be able to perform some basic matrix algebra on the vectors and store the results (a numeric value) in a third picture.
EXAMPLE
Field 1: “112 1454 122 342”
Field 2: “122 1343 32”
Key for Vector Element 112 1454 122 342 1343 32
Field 1 Vector: 1 1 1 1 0 0
Field 2 Vector: 0 0 1 0 1 1
This is essentially a numerical application of the Salton Wong and Yang (1975) vector space model.
Does anyone have any code handy to do this, or can anyone point me to resources where I can learn it myself? I've been struggling to find stuff.
Thank you all!
I know nothing about this topic, but a quick goggle search led to a couple of R packages and since SAS now integrates through IML to R you can use these packages from SAS
see
library(RNewsflow)https://cran.r-project.org/web/packages/RNewsflow/vignettes/RNewsflow.html
You should post it at IML forum since it is about Matrix operation.
data have;
Field1='112 1454 122 342';
Field2='122 1343 32';output;
run;
proc iml;
use have;
read all var {field1 field2};
close;
n1=countw(field1);
temp1=scan(field1,1:n1);
n2=countw(field2);
temp2=scan(field2,1:n2);
all=union(temp1,temp2);
new_field1=t(element(all,temp1));
new_field2=t(element(all,temp2));
print new_field1[r=all],new_field2[r=all];
quit;
Thanks for the code.
There's one hiccup I can't work out.
If document 1 equals "2 22 4 42"
and document 2 equals "2 4"
The vectors for document 2 ticks 1 for all four codes, not just 2 and 4.
I've modified the code slightly, perhaps I did something but I don't think so:
data have;
Field1='4 2 22 42';
Field2='4 2';output;
run;
proc iml;
use have;
read all var {field1 field2};
close;
n1=countw(field1);
temp1=scan(field1,1:n1);
n2=countw(field2);
temp2=scan(field2,1:n2);
all=union(temp1,temp2);
vector1=t(element(all,temp1));
vector2=t(element(all,temp2));
v1 = sqrt(sum(vector1));
v2 = sqrt(sum(vector2));
dotproduct = vector1` * vector2 ;
similarity = dotproduct / (v1*v2) ;
print vector1[r=all],vector2[r=all], v1, v2, dotproduct, similarity;
quit;
Any suggestions would be appeciated! Thanks so much!
What if your data look like the following, what you gonna do ? If document 1 equals "2 22 4 42 142 " and document 2 equals "2 4"
OK. Assuming I understand what you mean.
data have;
Field1='2 22 4 42';
Field2='2 4';output;
run;
proc iml;
use have;
read all var {field1 field2};
close;
n1=countw(field1);
temp1=scan(field1,1:n1);
n2=countw(field2);
temp2=scan(field2,1:n2);
all=union(temp1,temp2);
newfield1=j(1,ncol(all));
newfield2=j(1,ncol(all));
do j=1 to ncol(all);
temp=all[j];
t=substr(temp,1:length(temp),1);
newfield1[j]=all(element(t,temp1));
newfield2[j]=all(element(t,temp2));
end;
want=newfield1//newfield2;
mattrib want r={newfield1 newfield2} c=all l='';
print want;
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.