BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Dingdang
Fluorite | Level 6

Dear all,

 

I have two matrices like this:


DATA mydata ;
INPUT ID
Country $
value ;
CARDS ;
1 CH 24
1 ES 55
1 EE 20
2 BR 10
2 CO 50
2 CH 10
;
RUN ;

data supplement;
input Country CH DE BR ES CU CO EE;
cards;
CH 1 0.4 0.5 0.6 0.7 0.8 0.9
DE 0.4 1 0.5 0.6 0.7 0.8 0.9
BR 0.4 0.5 1 0.6 0.7 0.8 0.9
ES 0.4 0.5 0.6 1 0.7 0.8 0.9
CU 0.4 0.5 0.6 0.7 1 0.8 0.9
CO 0.4 0.5 0.6 0.7 0.8 1 0.9
EE 0.4 0.5 0.6 0.7 0.8 0.9 1
;
run;

 

The second one can be understood as a correlation matrix (weights) between countries. For each ID in mydata, I want to have the weighted sum of the values, where the weights between two countries come from the data supplement. For example, for ID 1, I want to have 0.4*24*55+0.4* 24*20+0.9*55*20. 

So in the output, I have two variables, the ID and the weighted sum. 

 

I am guessing one can solve this problem with proc iml. But how can I do it? 

 

Many many thanks for helping!!

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I am assuming that the data are sorted by the ID variable and that the order of the observations in the MyData set match the order of the variables and observations in the Supplement matrix. The main techniques you need to read about and learn are

1. The LOC-ELEMENT technique to find the variables that are in each ID group.

2. Use the ALLCOMB function to generate all pairwise combinations of the variables

3. Convert subscripts [i,j] into an index

4. The UNIQUEBY-LOC technique to loop over the values of the ID variable

 

Hopefully the comments in the program will adequately describe each step.

 

DATA mydata ; 
INPUT ID
Country $ 
value ; 
CARDS ; 
1 CH 24 
1 ES 55
1 EE 20 
2 BR 10
2 CO 50
2 CH 10 
3 CH 24 
3 BR 10
3 ES 55
3 CO 50
;

data supplement;
input Country $ CH DE BR ES CU CO EE;
cards;
CH 1   0.1 0.2 0.4 0.5 0.6 0.7
DE 0.1 1   0.5 0.6 0.7 0.8 0.9
BR 0.2 0.5 1   0.1 0.2 0.3 0.4
ES 0.4 0.6 0.1 1   0.4 0.5 0.6
CU 0.5 0.7 0.2 0.4 1   0.7 0.8
CO 0.6 0.8 0.3 0.5 0.7 1   0.9
EE 0.7 0.9 0.4 0.6 0.8 0.9 1
;

/* assume MyData is sorted by ID and that the countries for 
   each ID are the same order as in the Supplement matrix
*/
proc iml;

use mydata;
read all var {ID 'Country' 'Value'};
close;
use supplement;
read all var _NUM_ into R[colname=varNames];
close;

/***************************************************/
/* Demonstrate the computation for ID=1 */
idx = loc(ID=1);
c = Country[idx];   /* countries for ID=1 */
v = Value[idx];     /* values for ID=1 */
print c v;

/* which variables are being asked for? 
   LOC-ELEMENT technique:
   https://blogs.sas.com/content/iml/2015/05/11/loc-element-trick.html
*/
varIdx = loc( element(varNames, c) );
print varIdx;

/* get all pairwise combinations:
   https://blogs.sas.com/content/iml/2013/09/30/generate-combinations-in-sas.html
*/
pairs = allcomb(ncol(varIdx), 2);
varPairs = shape( varIdx[pairs], 0, 2);
print pairs varPairs;

/* convert subscripts to indices:
   https://blogs.sas.com/content/iml/2011/02/16/converting-matrix-subscripts-to-indices.html
*/
ndx = sub2ndx(dimension(R), varPairs);
w = R[ndx];
x = v[pairs[,1]];
y = v[pairs[,2]];
print w x y;

wsum = sum( w # x # y );
print wsum; 
/*****************************************************/


/***************************************************/
/* Now the real thing: compute for all IDs. Assume sorted by ID */
/***************************************************/
/* For UNIQUEBY-LOC trick, see
   https://blogs.sas.com/content/iml/2011/11/07/an-efficient-alternative-to-the-unique-loc-technique.html
*/
b = uniqueby(ID);         /* b[i] = beginning of i_th category */
labl = char(ID[b]);
wsum = j(nrow(b),1);      /* 3. Allocate vector to hold results */
b = b // (nrow(ID)+1);     /* trick: append (n+1) to end of b */
do i = 1 to nrow(b)-1;    /* 4. For each level... */
   idx = b[i]:(b[i+1]-1); /* 5. Find observations in level */
   /* 6. Compute statistic on those values */
   c = Country[idx];   /* countries for ID=1 */
   v = Value[idx];     /* values for ID=1 */
   varIdx = loc( element(varNames, c) );
   pairs = allcomb(ncol(varIdx), 2);
   varPairs = shape( varIdx[pairs], 0, 2);
   ndx = sub2ndx(dimension(R), varPairs);
   w = R[ndx];
   x = v[pairs[,1]];
   y = v[pairs[,2]];
   wsum[i] = sum( w # x # y );
end;
print wsum[rowname=labl];

View solution in original post

7 REPLIES 7
Rick_SAS
SAS Super FREQ

I don't understand your example.

1. If the second data set represents correlation, it should be symmetric. 

2. How are you obtaining the values for the weighted sum?  I would expect a weighted sum to look like

w_CH* 24 + w_ES* 55 + w_EE* 20.

It looks like you are using pairwise combinations of the three values? What value do you for the sum when ID=2? 

3. Can there be four items for an ID? If so, can you make an example for ID=3 and the countries CH, ES, BR, and CO?

 

If your question is about how to read in data to an IML matrix, the following program shows how to read in the data and how to extract the observations for ID=1: You can use the UNIQUE-LOC trick to look over all ID values in the MyData data set.

 

DATA mydata ; 
INPUT ID
Country $ 
value ; 
CARDS ; 
1 CH 24 
1 ES 55
1 EE 20 
2 BR 10
2 CO 50
2 CH 10 
;
RUN ;

data supplement;
input Country $ CH DE BR ES CU CO EE;
cards;
CH 1   0.4 0.5 0.6 0.7 0.8 0.9
DE 0.4 1   0.5 0.6 0.7 0.8 0.9
BR 0.5 0.5 1   0.6 0.7 0.8 0.9
ES 0.6 0.6 0.6 1   0.7 0.8 0.9
CU 0.7 0.7 0.7 0.7 1   0.8 0.9
CO 0.8 0.8 0.8 0.8 0.8 1   0.9
EE 0.9 0.9 0.9 0.9 0.9 0.9 1
;
run;

proc iml;
use mydata;
read all var {ID 'Country' 'Value'};
close;
use supplement;
read all var _NUM_ into R[colname=varNames];
close;

print R[colname=varNames rowname=varNames];

idx = loc(ID=1);
c = Country[idx];
v = Value[idx];
print c v;
Dingdang
Fluorite | Level 6

Hi Rick,

 

Thanks for your reply. You are right, that was a stupid mistake from me. Of course the matrice should be symmetic.

 

ID 1 has three countries, CH, ES and EE. Like you said, I need the sum of pairwise product with the weight from the second matrix, which means the sum for ID 1 will be W_{CH, ES}*Value_CH*Value_ES + W_{CH_EE}*Value_CH*Value_EE+W_{ES, EE}*Value_ES*Value_EE. And analogously, for ID 2 the sum will be W_{BR, CO}*Value_BR*Value_CO+W_{BR,CH}*Value_BR*Value_CH+W_{CO, CH}*Value_CO*Value_CH.

 

Each ID has the same number of countries, but they could be totally different and their order could be totally different as well. The second data contains all possible correlations between the countries in mydata and works like a look up table.

 

Many thanks again!

Rick_SAS
SAS Super FREQ

I see. So if there are 4 elements for ID=3, you will want all "4 choose 2" = 6 pairwise combinations?  The coefficients are always for a pair of countries?

 

 

Dingdang
Fluorite | Level 6

exactly.

Rick_SAS
SAS Super FREQ

I think I can solve this problem. If you don't mind my asking, what is the application of this formula? What does the weighted sum of the pairwise products represent?

Dingdang
Fluorite | Level 6

Hi Rick,

 

That would be so great!

 

This Kind of fomula is not that seldom in risk Management (e.g. variance-covariance method) with their root in stochastic (calculating the variance of the sum of random variables). I am using it to calculate a Kind of modified Herfindahl-Hirschmann index (concentration measure).

 

Looking Forward to your solution!

 

Best

 

Rick_SAS
SAS Super FREQ

I am assuming that the data are sorted by the ID variable and that the order of the observations in the MyData set match the order of the variables and observations in the Supplement matrix. The main techniques you need to read about and learn are

1. The LOC-ELEMENT technique to find the variables that are in each ID group.

2. Use the ALLCOMB function to generate all pairwise combinations of the variables

3. Convert subscripts [i,j] into an index

4. The UNIQUEBY-LOC technique to loop over the values of the ID variable

 

Hopefully the comments in the program will adequately describe each step.

 

DATA mydata ; 
INPUT ID
Country $ 
value ; 
CARDS ; 
1 CH 24 
1 ES 55
1 EE 20 
2 BR 10
2 CO 50
2 CH 10 
3 CH 24 
3 BR 10
3 ES 55
3 CO 50
;

data supplement;
input Country $ CH DE BR ES CU CO EE;
cards;
CH 1   0.1 0.2 0.4 0.5 0.6 0.7
DE 0.1 1   0.5 0.6 0.7 0.8 0.9
BR 0.2 0.5 1   0.1 0.2 0.3 0.4
ES 0.4 0.6 0.1 1   0.4 0.5 0.6
CU 0.5 0.7 0.2 0.4 1   0.7 0.8
CO 0.6 0.8 0.3 0.5 0.7 1   0.9
EE 0.7 0.9 0.4 0.6 0.8 0.9 1
;

/* assume MyData is sorted by ID and that the countries for 
   each ID are the same order as in the Supplement matrix
*/
proc iml;

use mydata;
read all var {ID 'Country' 'Value'};
close;
use supplement;
read all var _NUM_ into R[colname=varNames];
close;

/***************************************************/
/* Demonstrate the computation for ID=1 */
idx = loc(ID=1);
c = Country[idx];   /* countries for ID=1 */
v = Value[idx];     /* values for ID=1 */
print c v;

/* which variables are being asked for? 
   LOC-ELEMENT technique:
   https://blogs.sas.com/content/iml/2015/05/11/loc-element-trick.html
*/
varIdx = loc( element(varNames, c) );
print varIdx;

/* get all pairwise combinations:
   https://blogs.sas.com/content/iml/2013/09/30/generate-combinations-in-sas.html
*/
pairs = allcomb(ncol(varIdx), 2);
varPairs = shape( varIdx[pairs], 0, 2);
print pairs varPairs;

/* convert subscripts to indices:
   https://blogs.sas.com/content/iml/2011/02/16/converting-matrix-subscripts-to-indices.html
*/
ndx = sub2ndx(dimension(R), varPairs);
w = R[ndx];
x = v[pairs[,1]];
y = v[pairs[,2]];
print w x y;

wsum = sum( w # x # y );
print wsum; 
/*****************************************************/


/***************************************************/
/* Now the real thing: compute for all IDs. Assume sorted by ID */
/***************************************************/
/* For UNIQUEBY-LOC trick, see
   https://blogs.sas.com/content/iml/2011/11/07/an-efficient-alternative-to-the-unique-loc-technique.html
*/
b = uniqueby(ID);         /* b[i] = beginning of i_th category */
labl = char(ID[b]);
wsum = j(nrow(b),1);      /* 3. Allocate vector to hold results */
b = b // (nrow(ID)+1);     /* trick: append (n+1) to end of b */
do i = 1 to nrow(b)-1;    /* 4. For each level... */
   idx = b[i]:(b[i+1]-1); /* 5. Find observations in level */
   /* 6. Compute statistic on those values */
   c = Country[idx];   /* countries for ID=1 */
   v = Value[idx];     /* values for ID=1 */
   varIdx = loc( element(varNames, c) );
   pairs = allcomb(ncol(varIdx), 2);
   varPairs = shape( varIdx[pairs], 0, 2);
   ndx = sub2ndx(dimension(R), varPairs);
   w = R[ndx];
   x = v[pairs[,1]];
   y = v[pairs[,2]];
   wsum[i] = sum( w # x # y );
end;
print wsum[rowname=labl];

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 7 replies
  • 1888 views
  • 1 like
  • 2 in conversation