Statistical programming, matrix languages, and more

Vector Space Model in SAS

Reply
Occasional Contributor
Posts: 5

Vector Space Model in SAS

Hello,

 

I have an observation with two text fields that contain listings of error keys.

  

I need to convert these fields into two vectors the have a zero/one indicator for each error key. I then need to be able to perform some basic matrix algebra on the vectors and store the results (a numeric value) in a third picture.

 

 

EXAMPLE

Field 1: “112 1454 122 342”

Field 2: “122 1343 32”

 

Key for Vector Element          112 1454 122 342 1343 32

Field 1 Vector:                          1      1       1    1      0     0

Field 2 Vector:                          0      0       1    0      1     1     

 

 

This is essentially a numerical application of the Salton Wong and Yang (1975) vector space model.

 

Does anyone have any code handy to do this, or can anyone point me to resources where I can learn it myself? I've been struggling to find stuff.

 

Thank you all!

SAS Super FREQ
Posts: 3,236

Re: Vector Space Model in SAS

[ Edited ]

You didn't provide what your data look like, so I'll just have to guess. Try the following:

  1. In the DATA step, use the COUNTW and SCAN functions to parse the keys, so that the data has one key in each row
  2. In SAS/IML use the UNIQUE and UNIQUEBY functions to find the rows that start each field. For details, see
    "An efficient alternative to the UNIQUE-LOC technique."
  3. Use the ELEMENT function to find the location of the keys for each row. Fopr details, see 
    "Finding elements in one vector that are not in another vector"

Sample code:

 

data have;
length s $100;
input s & $;   /* special character '&' reads until 2 or more blanks */
Field = _N_;
cnt = countw(s, ' ');
do i = 1 to cnt;
   key = scan(s, i, ' ');
   output;
end;
datalines;
112 1454 122 342
122 1343 32
;

proc iml;
use Have;
read all var {"Field" "Key"};
close;

Fields = unique(Field);
Keys = unique(Key);
Result = j(ncol(Fields), ncol(Keys), 0);

/* http://blogs.sas.com/content/iml/2011/11/07/an-efficient-alternative-to-the-unique-loc-technique.html */
b = uniqueby(Field, 1);   /* b[i] = beginning of i_th category */
b = b // (nrow(Field)+1); /* trick: append (n+1) to end of b */
do i = 1 to nrow(b)-1;    /* For each level... */
   idx = b[i]:(b[i+1]-1); /*   Find observations in level */
   /* http://blogs.sas.com/content/iml/2014/03/17/finding-elements-in-one-vector-that-are-not-in-another-vector.html */
   Result[i,] = element(Keys, Key[idx]);
end;

F = char(Fields);
print Result[rowname=F colname=Keys];
Post a Question
Discussion Stats
  • 1 reply
  • 144 views
  • 0 likes
  • 2 in conversation