Dear all,
I have two matrices like this:
DATA mydata ;
INPUT ID
Country $
value ;
CARDS ;
1 CH 24
1 ES 55
1 EE 20
2 BR 10
2 CO 50
2 CH 10
;
RUN ;
data supplement;
input Country CH DE BR ES CU CO EE;
cards;
CH 1 0.4 0.5 0.6 0.7 0.8 0.9
DE 0.4 1 0.5 0.6 0.7 0.8 0.9
BR 0.4 0.5 1 0.6 0.7 0.8 0.9
ES 0.4 0.5 0.6 1 0.7 0.8 0.9
CU 0.4 0.5 0.6 0.7 1 0.8 0.9
CO 0.4 0.5 0.6 0.7 0.8 1 0.9
EE 0.4 0.5 0.6 0.7 0.8 0.9 1
;
run;
The second one can be understood as a correlation matrix (weights) between countries. For each ID in mydata, I want to have the weighted sum of the values, where the weights between two countries come from the data supplement. For example, for ID 1, I want to have 0.4*24*55+0.4* 24*20+0.9*55*20.
So in the output, I have two variables, the ID and the weighted sum.
I am guessing one can solve this problem with proc iml. But how can I do it?
Many many thanks for helping!!
I am assuming that the data are sorted by the ID variable and that the order of the observations in the MyData set match the order of the variables and observations in the Supplement matrix. The main techniques you need to read about and learn are
1. The LOC-ELEMENT technique to find the variables that are in each ID group.
2. Use the ALLCOMB function to generate all pairwise combinations of the variables
3. Convert subscripts [i,j] into an index
4. The UNIQUEBY-LOC technique to loop over the values of the ID variable
Hopefully the comments in the program will adequately describe each step.
DATA mydata ;
INPUT ID
Country $
value ;
CARDS ;
1 CH 24
1 ES 55
1 EE 20
2 BR 10
2 CO 50
2 CH 10
3 CH 24
3 BR 10
3 ES 55
3 CO 50
;
data supplement;
input Country $ CH DE BR ES CU CO EE;
cards;
CH 1 0.1 0.2 0.4 0.5 0.6 0.7
DE 0.1 1 0.5 0.6 0.7 0.8 0.9
BR 0.2 0.5 1 0.1 0.2 0.3 0.4
ES 0.4 0.6 0.1 1 0.4 0.5 0.6
CU 0.5 0.7 0.2 0.4 1 0.7 0.8
CO 0.6 0.8 0.3 0.5 0.7 1 0.9
EE 0.7 0.9 0.4 0.6 0.8 0.9 1
;
/* assume MyData is sorted by ID and that the countries for
each ID are the same order as in the Supplement matrix
*/
proc iml;
use mydata;
read all var {ID 'Country' 'Value'};
close;
use supplement;
read all var _NUM_ into R[colname=varNames];
close;
/***************************************************/
/* Demonstrate the computation for ID=1 */
idx = loc(ID=1);
c = Country[idx]; /* countries for ID=1 */
v = Value[idx]; /* values for ID=1 */
print c v;
/* which variables are being asked for?
LOC-ELEMENT technique:
https://blogs.sas.com/content/iml/2015/05/11/loc-element-trick.html
*/
varIdx = loc( element(varNames, c) );
print varIdx;
/* get all pairwise combinations:
https://blogs.sas.com/content/iml/2013/09/30/generate-combinations-in-sas.html
*/
pairs = allcomb(ncol(varIdx), 2);
varPairs = shape( varIdx[pairs], 0, 2);
print pairs varPairs;
/* convert subscripts to indices:
https://blogs.sas.com/content/iml/2011/02/16/converting-matrix-subscripts-to-indices.html
*/
ndx = sub2ndx(dimension(R), varPairs);
w = R[ndx];
x = v[pairs[,1]];
y = v[pairs[,2]];
print w x y;
wsum = sum( w # x # y );
print wsum;
/*****************************************************/
/***************************************************/
/* Now the real thing: compute for all IDs. Assume sorted by ID */
/***************************************************/
/* For UNIQUEBY-LOC trick, see
https://blogs.sas.com/content/iml/2011/11/07/an-efficient-alternative-to-the-unique-loc-technique.html
*/
b = uniqueby(ID); /* b[i] = beginning of i_th category */
labl = char(ID[b]);
wsum = j(nrow(b),1); /* 3. Allocate vector to hold results */
b = b // (nrow(ID)+1); /* trick: append (n+1) to end of b */
do i = 1 to nrow(b)-1; /* 4. For each level... */
idx = b[i]:(b[i+1]-1); /* 5. Find observations in level */
/* 6. Compute statistic on those values */
c = Country[idx]; /* countries for ID=1 */
v = Value[idx]; /* values for ID=1 */
varIdx = loc( element(varNames, c) );
pairs = allcomb(ncol(varIdx), 2);
varPairs = shape( varIdx[pairs], 0, 2);
ndx = sub2ndx(dimension(R), varPairs);
w = R[ndx];
x = v[pairs[,1]];
y = v[pairs[,2]];
wsum[i] = sum( w # x # y );
end;
print wsum[rowname=labl];
I don't understand your example.
1. If the second data set represents correlation, it should be symmetric.
2. How are you obtaining the values for the weighted sum? I would expect a weighted sum to look like
w_CH* 24 + w_ES* 55 + w_EE* 20.
It looks like you are using pairwise combinations of the three values? What value do you for the sum when ID=2?
3. Can there be four items for an ID? If so, can you make an example for ID=3 and the countries CH, ES, BR, and CO?
If your question is about how to read in data to an IML matrix, the following program shows how to read in the data and how to extract the observations for ID=1: You can use the UNIQUE-LOC trick to look over all ID values in the MyData data set.
DATA mydata ;
INPUT ID
Country $
value ;
CARDS ;
1 CH 24
1 ES 55
1 EE 20
2 BR 10
2 CO 50
2 CH 10
;
RUN ;
data supplement;
input Country $ CH DE BR ES CU CO EE;
cards;
CH 1 0.4 0.5 0.6 0.7 0.8 0.9
DE 0.4 1 0.5 0.6 0.7 0.8 0.9
BR 0.5 0.5 1 0.6 0.7 0.8 0.9
ES 0.6 0.6 0.6 1 0.7 0.8 0.9
CU 0.7 0.7 0.7 0.7 1 0.8 0.9
CO 0.8 0.8 0.8 0.8 0.8 1 0.9
EE 0.9 0.9 0.9 0.9 0.9 0.9 1
;
run;
proc iml;
use mydata;
read all var {ID 'Country' 'Value'};
close;
use supplement;
read all var _NUM_ into R[colname=varNames];
close;
print R[colname=varNames rowname=varNames];
idx = loc(ID=1);
c = Country[idx];
v = Value[idx];
print c v;
Hi Rick,
Thanks for your reply. You are right, that was a stupid mistake from me. Of course the matrice should be symmetic.
ID 1 has three countries, CH, ES and EE. Like you said, I need the sum of pairwise product with the weight from the second matrix, which means the sum for ID 1 will be W_{CH, ES}*Value_CH*Value_ES + W_{CH_EE}*Value_CH*Value_EE+W_{ES, EE}*Value_ES*Value_EE. And analogously, for ID 2 the sum will be W_{BR, CO}*Value_BR*Value_CO+W_{BR,CH}*Value_BR*Value_CH+W_{CO, CH}*Value_CO*Value_CH.
Each ID has the same number of countries, but they could be totally different and their order could be totally different as well. The second data contains all possible correlations between the countries in mydata and works like a look up table.
Many thanks again!
I see. So if there are 4 elements for ID=3, you will want all "4 choose 2" = 6 pairwise combinations? The coefficients are always for a pair of countries?
exactly.
I think I can solve this problem. If you don't mind my asking, what is the application of this formula? What does the weighted sum of the pairwise products represent?
Hi Rick,
That would be so great!
This Kind of fomula is not that seldom in risk Management (e.g. variance-covariance method) with their root in stochastic (calculating the variance of the sum of random variables). I am using it to calculate a Kind of modified Herfindahl-Hirschmann index (concentration measure).
Looking Forward to your solution!
Best
I am assuming that the data are sorted by the ID variable and that the order of the observations in the MyData set match the order of the variables and observations in the Supplement matrix. The main techniques you need to read about and learn are
1. The LOC-ELEMENT technique to find the variables that are in each ID group.
2. Use the ALLCOMB function to generate all pairwise combinations of the variables
3. Convert subscripts [i,j] into an index
4. The UNIQUEBY-LOC technique to loop over the values of the ID variable
Hopefully the comments in the program will adequately describe each step.
DATA mydata ;
INPUT ID
Country $
value ;
CARDS ;
1 CH 24
1 ES 55
1 EE 20
2 BR 10
2 CO 50
2 CH 10
3 CH 24
3 BR 10
3 ES 55
3 CO 50
;
data supplement;
input Country $ CH DE BR ES CU CO EE;
cards;
CH 1 0.1 0.2 0.4 0.5 0.6 0.7
DE 0.1 1 0.5 0.6 0.7 0.8 0.9
BR 0.2 0.5 1 0.1 0.2 0.3 0.4
ES 0.4 0.6 0.1 1 0.4 0.5 0.6
CU 0.5 0.7 0.2 0.4 1 0.7 0.8
CO 0.6 0.8 0.3 0.5 0.7 1 0.9
EE 0.7 0.9 0.4 0.6 0.8 0.9 1
;
/* assume MyData is sorted by ID and that the countries for
each ID are the same order as in the Supplement matrix
*/
proc iml;
use mydata;
read all var {ID 'Country' 'Value'};
close;
use supplement;
read all var _NUM_ into R[colname=varNames];
close;
/***************************************************/
/* Demonstrate the computation for ID=1 */
idx = loc(ID=1);
c = Country[idx]; /* countries for ID=1 */
v = Value[idx]; /* values for ID=1 */
print c v;
/* which variables are being asked for?
LOC-ELEMENT technique:
https://blogs.sas.com/content/iml/2015/05/11/loc-element-trick.html
*/
varIdx = loc( element(varNames, c) );
print varIdx;
/* get all pairwise combinations:
https://blogs.sas.com/content/iml/2013/09/30/generate-combinations-in-sas.html
*/
pairs = allcomb(ncol(varIdx), 2);
varPairs = shape( varIdx[pairs], 0, 2);
print pairs varPairs;
/* convert subscripts to indices:
https://blogs.sas.com/content/iml/2011/02/16/converting-matrix-subscripts-to-indices.html
*/
ndx = sub2ndx(dimension(R), varPairs);
w = R[ndx];
x = v[pairs[,1]];
y = v[pairs[,2]];
print w x y;
wsum = sum( w # x # y );
print wsum;
/*****************************************************/
/***************************************************/
/* Now the real thing: compute for all IDs. Assume sorted by ID */
/***************************************************/
/* For UNIQUEBY-LOC trick, see
https://blogs.sas.com/content/iml/2011/11/07/an-efficient-alternative-to-the-unique-loc-technique.html
*/
b = uniqueby(ID); /* b[i] = beginning of i_th category */
labl = char(ID[b]);
wsum = j(nrow(b),1); /* 3. Allocate vector to hold results */
b = b // (nrow(ID)+1); /* trick: append (n+1) to end of b */
do i = 1 to nrow(b)-1; /* 4. For each level... */
idx = b[i]:(b[i+1]-1); /* 5. Find observations in level */
/* 6. Compute statistic on those values */
c = Country[idx]; /* countries for ID=1 */
v = Value[idx]; /* values for ID=1 */
varIdx = loc( element(varNames, c) );
pairs = allcomb(ncol(varIdx), 2);
varPairs = shape( varIdx[pairs], 0, 2);
ndx = sub2ndx(dimension(R), varPairs);
w = R[ndx];
x = v[pairs[,1]];
y = v[pairs[,2]];
wsum[i] = sum( w # x # y );
end;
print wsum[rowname=labl];
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.