BookmarkSubscribeRSS Feed
mbaugh
Calcite | Level 5

Hello,

 

I have an observation with two text fields that contain listings of error keys.

  

I need to convert these fields into two vectors the have a zero/one indicator for each error key. I then need to be able to perform some basic matrix algebra on the vectors and store the results (a numeric value) in a third picture.

 

 

EXAMPLE

Field 1: “112 1454 122 342”

Field 2: “122 1343 32”

 

Key for Vector Element          112 1454 122 342 1343 32

Field 1 Vector:                          1      1       1    1      0     0

Field 2 Vector:                          0      0       1    0      1     1     

 

 

This is essentially a numerical application of the Salton Wong and Yang (1975) vector space model.

 

Does anyone have any code handy to do this, or can anyone point me to resources where I can learn it myself? I've been struggling to find stuff.

 

Thank you all!

5 REPLIES 5
rogerjdeangelis
Barite | Level 11

I know nothing about this topic, but a quick goggle search led to a couple of R packages and since SAS now integrates through IML to R you can use these packages from SAS

 

see

library(RNewsflow)

https://cran.r-project.org/web/packages/RNewsflow/vignettes/RNewsflow.html

 

https://cran.r-project.org/web/packages/jmotif/README.html

Ksharp
Super User
You should post it at IML forum since it is about Matrix operation.

data have;
Field1='112 1454 122 342';
Field2='122 1343 32';output;
run;
proc iml;
use have;
read all var {field1 field2};
close;
n1=countw(field1);
temp1=scan(field1,1:n1);
n2=countw(field2);
temp2=scan(field2,1:n2);

all=union(temp1,temp2);

new_field1=t(element(all,temp1));
new_field2=t(element(all,temp2));
print new_field1[r=all],new_field2[r=all];

quit;


mbaugh
Calcite | Level 5

@Ksharp

 

Thanks for the code.

 

There's one hiccup I can't work out. 

 

If document 1 equals "2 22 4 42"

and document 2 equals "2 4"

 

The vectors for document 2 ticks 1 for all four codes, not just 2 and 4. 

 

I've modified the code slightly, perhaps I did something but I don't think so:


data have;
Field1='4 2 22 42';
Field2='4 2';output;
run;
proc iml;
use have;
read all var {field1 field2};
close;
n1=countw(field1);
temp1=scan(field1,1:n1);
n2=countw(field2);
temp2=scan(field2,1:n2);

all=union(temp1,temp2);

vector1=t(element(all,temp1));
vector2=t(element(all,temp2));
v1 = sqrt(sum(vector1));
v2 = sqrt(sum(vector2));
dotproduct = vector1` * vector2 ;
similarity = dotproduct / (v1*v2) ;
print vector1[r=all],vector2[r=all], v1, v2, dotproduct, similarity;

quit;

 

Any suggestions would be appeciated! Thanks so much!

Ksharp
Super User
What if your data look like the following, what you gonna do ?

 
If document 1 equals "2 22 4 42 142 "
and document 2 equals "2 4"


Ksharp
Super User

OK. Assuming I understand what you mean.

 

 


data have;
Field1='2 22 4 42';
Field2='2 4';output;
run;
proc iml;
use have;
read all var {field1 field2};
close;
n1=countw(field1);
temp1=scan(field1,1:n1);
n2=countw(field2);
temp2=scan(field2,1:n2);

all=union(temp1,temp2);

newfield1=j(1,ncol(all));
newfield2=j(1,ncol(all));

 do j=1 to ncol(all);
   temp=all[j];
   t=substr(temp,1:length(temp),1); 
   newfield1[j]=all(element(t,temp1));
   newfield2[j]=all(element(t,temp2));
 end;
 
want=newfield1//newfield2;
mattrib want r={newfield1 newfield2} c=all l='';
print want;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1250 views
  • 0 likes
  • 3 in conversation