I will need to export the results of what I am now doing in IML to a SAS dataset and thence to Excel and join the data (in another program) to another table by a unique key. Therefore, I need to define unique key values for each row (or column) in each of my matrices and somehow ensure that those numbers do not change as a result of any operations on the matrix and stay together with their corresponding values in other rows (or columns). What is the best way of accomplishing this? I know I can have one matrix serve as row (or column) names for another matrix. Does the answer involve writing my unique key numbers to one matrix (let's call it LOCATION) and set it to be (for example) the row numbers on another matrix, as follows:
rowname = LOCATION
Also, to multiply matrices, the number of rows in the first matrix must equal the number of columns in the second. How do I ensure that my unique key row (or column) is not counted?
Yes. Put the data in one matrix and the labels/identifiers in another.
Hard to say without knowing what operations you are trying to perform. The observation order won't change unless you perform sorting or subsetting (extracting rows). For those operations, you'll also need to sort or subset the ID variable.
If X is the observation matrix (with N rows and k cols) and beta is the matrix (with k rows) that you are multiplying agains, then the resulting matrix has N rows and the ID variables do not change because Y = X*beta also has N rows.
Suppose that there are two matrices. Let's call them COST, defined as follows,
COST = {12 15 11};
and FREIGHT, below:
FREIGHT = {06050 06005 18020 40071, 1.5 0.7 2.2 1.8, 1.0 3.1 1.9 1.4, 1.7 2.0 2.6 1.5};
The top row in FREIGHT is the unique ID number of (for example) some facility such as a warehouse. How do I make sure that the numbers in the top row are excluded from the calculation when FREIGHT is multiplied by COST? (I'll have to somehow exclude the first row in FREIGHT in any case, otherwise the number of columns in the first matrix won't equal the number of rows in the second). Or should I even include the unique ID number in FREIGHT? Should I, perhaps, make it a separate matrix (let's call it LOCATION) and specify that as the column names for FREIGHT and write the product of FREIGHT and COST to a SAS dataset with LOCATION for column names?
Your problem is transposed from the way most people represent it. Most people represent COST as a column vector, and use the transpose of FREIGHT as a data matrix which has three variables and a row identifier with values '06050', '06005', etc.
One reason that the transposed version is easier to work with is that the ID values in your data cannot be the name of a SAS variable. SAS variables must start with a letter or an underscore; they cannot be numbers. If you transpose, then the ID values can be numbers or character values (your choice).
The following SAS/IML program transposes your data, multiplies it, and then writes it to a SAS data set that has two columns:
proc iml;
COST = {12 15 11};
FREIGHT = {06050 06005 18020 40071,
1.5 0.7 2.2 1.8,
1.0 3.1 1.9 1.4,
1.7 2.0 2.6 1.5};
/* transpose data */
C = T(cost);
F = T(FREIGHT);
Location = F[, 1]; /* extract ID column */
X = F[ , 2:nrow(FREIGHT)]; /* call other columns X */
Total = X*C;
create output var {"Location" "Total"};
append;
close output;
quit;
proc print;
format Location Z5.;
run;
Thanks! So, was I right, then, about the need to extract the ID numbers to a separate matrix?
Yes. Put the data in one matrix and the labels/identifiers in another.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.