BookmarkSubscribeRSS Feed
mbaugh
Calcite | Level 5

Hello,

 

I have an observation with two text fields that contain listings of error keys.

  

I need to convert these fields into two vectors the have a zero/one indicator for each error key. I then need to be able to perform some basic matrix algebra on the vectors and store the results (a numeric value) in a third picture.

 

 

EXAMPLE

Field 1: “112 1454 122 342”

Field 2: “122 1343 32”

 

Key for Vector Element          112 1454 122 342 1343 32

Field 1 Vector:                          1      1       1    1      0     0

Field 2 Vector:                          0      0       1    0      1     1     

 

 

This is essentially a numerical application of the Salton Wong and Yang (1975) vector space model.

 

Does anyone have any code handy to do this, or can anyone point me to resources where I can learn it myself? I've been struggling to find stuff.

 

Thank you all!

1 REPLY 1
Rick_SAS
SAS Super FREQ

You didn't provide what your data look like, so I'll just have to guess. Try the following:

  1. In the DATA step, use the COUNTW and SCAN functions to parse the keys, so that the data has one key in each row
  2. In SAS/IML use the UNIQUE and UNIQUEBY functions to find the rows that start each field. For details, see
    "An efficient alternative to the UNIQUE-LOC technique."
  3. Use the ELEMENT function to find the location of the keys for each row. Fopr details, see 
    "Finding elements in one vector that are not in another vector"

Sample code:

 

data have;
length s $100;
input s & $;   /* special character '&' reads until 2 or more blanks */
Field = _N_;
cnt = countw(s, ' ');
do i = 1 to cnt;
   key = scan(s, i, ' ');
   output;
end;
datalines;
112 1454 122 342
122 1343 32
;

proc iml;
use Have;
read all var {"Field" "Key"};
close;

Fields = unique(Field);
Keys = unique(Key);
Result = j(ncol(Fields), ncol(Keys), 0);

/* http://blogs.sas.com/content/iml/2011/11/07/an-efficient-alternative-to-the-unique-loc-technique.html */
b = uniqueby(Field, 1);   /* b[i] = beginning of i_th category */
b = b // (nrow(Field)+1); /* trick: append (n+1) to end of b */
do i = 1 to nrow(b)-1;    /* For each level... */
   idx = b[i]:(b[i+1]-1); /*   Find observations in level */
   /* http://blogs.sas.com/content/iml/2014/03/17/finding-elements-in-one-vector-that-are-not-in-another-vector.html */
   Result[i,] = element(Keys, Key[idx]);
end;

F = char(Fields);
print Result[rowname=F colname=Keys];

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 1 reply
  • 763 views
  • 0 likes
  • 2 in conversation