Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 08-12-2019 09:42 AM
(1104 views)

Dear all,

I have two matrices like this:

DATA mydata ;

INPUT ID

Country $

value ;

CARDS ;

1 CH 24

1 ES 55

1 EE 20

2 BR 10

2 CO 50

2 CH 10

;

RUN ;

data supplement;

input Country CH DE BR ES CU CO EE;

cards;

CH 1 0.4 0.5 0.6 0.7 0.8 0.9

DE 0.4 1 0.5 0.6 0.7 0.8 0.9

BR 0.4 0.5 1 0.6 0.7 0.8 0.9

ES 0.4 0.5 0.6 1 0.7 0.8 0.9

CU 0.4 0.5 0.6 0.7 1 0.8 0.9

CO 0.4 0.5 0.6 0.7 0.8 1 0.9

EE 0.4 0.5 0.6 0.7 0.8 0.9 1

;

run;

The second one can be understood as a correlation matrix (weights) between countries. For each ID in mydata, I want to have the weighted sum of the values, where the weights between two countries come from the data supplement. For example, for ID 1, I want to have 0.4*24*55+0.4* 24*20+0.9*55*20.

So in the output, I have two variables, the ID and the weighted sum.

I am guessing one can solve this problem with proc iml. But how can I do it?

Many many thanks for helping!!

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I am assuming that the data are sorted by the ID variable and that the order of the observations in the MyData set match the order of the variables and observations in the Supplement matrix. The main techniques you need to read about and learn are

1. The LOC-ELEMENT technique to find the variables that are in each ID group.

2. Use the ALLCOMB function to generate all pairwise combinations of the variables

3. Convert subscripts [i,j] into an index

4. The UNIQUEBY-LOC technique to loop over the values of the ID variable

Hopefully the comments in the program will adequately describe each step.

```
DATA mydata ;
INPUT ID
Country $
value ;
CARDS ;
1 CH 24
1 ES 55
1 EE 20
2 BR 10
2 CO 50
2 CH 10
3 CH 24
3 BR 10
3 ES 55
3 CO 50
;
data supplement;
input Country $ CH DE BR ES CU CO EE;
cards;
CH 1 0.1 0.2 0.4 0.5 0.6 0.7
DE 0.1 1 0.5 0.6 0.7 0.8 0.9
BR 0.2 0.5 1 0.1 0.2 0.3 0.4
ES 0.4 0.6 0.1 1 0.4 0.5 0.6
CU 0.5 0.7 0.2 0.4 1 0.7 0.8
CO 0.6 0.8 0.3 0.5 0.7 1 0.9
EE 0.7 0.9 0.4 0.6 0.8 0.9 1
;
/* assume MyData is sorted by ID and that the countries for
each ID are the same order as in the Supplement matrix
*/
proc iml;
use mydata;
read all var {ID 'Country' 'Value'};
close;
use supplement;
read all var _NUM_ into R[colname=varNames];
close;
/***************************************************/
/* Demonstrate the computation for ID=1 */
idx = loc(ID=1);
c = Country[idx]; /* countries for ID=1 */
v = Value[idx]; /* values for ID=1 */
print c v;
/* which variables are being asked for?
LOC-ELEMENT technique:
https://blogs.sas.com/content/iml/2015/05/11/loc-element-trick.html
*/
varIdx = loc( element(varNames, c) );
print varIdx;
/* get all pairwise combinations:
https://blogs.sas.com/content/iml/2013/09/30/generate-combinations-in-sas.html
*/
pairs = allcomb(ncol(varIdx), 2);
varPairs = shape( varIdx[pairs], 0, 2);
print pairs varPairs;
/* convert subscripts to indices:
https://blogs.sas.com/content/iml/2011/02/16/converting-matrix-subscripts-to-indices.html
*/
ndx = sub2ndx(dimension(R), varPairs);
w = R[ndx];
x = v[pairs[,1]];
y = v[pairs[,2]];
print w x y;
wsum = sum( w # x # y );
print wsum;
/*****************************************************/
/***************************************************/
/* Now the real thing: compute for all IDs. Assume sorted by ID */
/***************************************************/
/* For UNIQUEBY-LOC trick, see
https://blogs.sas.com/content/iml/2011/11/07/an-efficient-alternative-to-the-unique-loc-technique.html
*/
b = uniqueby(ID); /* b[i] = beginning of i_th category */
labl = char(ID[b]);
wsum = j(nrow(b),1); /* 3. Allocate vector to hold results */
b = b // (nrow(ID)+1); /* trick: append (n+1) to end of b */
do i = 1 to nrow(b)-1; /* 4. For each level... */
idx = b[i]:(b[i+1]-1); /* 5. Find observations in level */
/* 6. Compute statistic on those values */
c = Country[idx]; /* countries for ID=1 */
v = Value[idx]; /* values for ID=1 */
varIdx = loc( element(varNames, c) );
pairs = allcomb(ncol(varIdx), 2);
varPairs = shape( varIdx[pairs], 0, 2);
ndx = sub2ndx(dimension(R), varPairs);
w = R[ndx];
x = v[pairs[,1]];
y = v[pairs[,2]];
wsum[i] = sum( w # x # y );
end;
print wsum[rowname=labl];
```

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I don't understand your example.

1. If the second data set represents correlation, it should be symmetric.

2. How are you obtaining the values for the weighted sum? I would expect a weighted sum to look like

w_CH* 24 + w_ES* 55 + w_EE* 20.

It looks like you are using pairwise combinations of the three values? What value do you for the sum when ID=2?

3. Can there be four items for an ID? If so, can you make an example for ID=3 and the countries CH, ES, BR, and CO?

If your question is about how to read in data to an IML matrix, the following program shows how to read in the data and how to extract the observations for ID=1: You can use the UNIQUE-LOC trick to look over all ID values in the MyData data set.

```
DATA mydata ;
INPUT ID
Country $
value ;
CARDS ;
1 CH 24
1 ES 55
1 EE 20
2 BR 10
2 CO 50
2 CH 10
;
RUN ;
data supplement;
input Country $ CH DE BR ES CU CO EE;
cards;
CH 1 0.4 0.5 0.6 0.7 0.8 0.9
DE 0.4 1 0.5 0.6 0.7 0.8 0.9
BR 0.5 0.5 1 0.6 0.7 0.8 0.9
ES 0.6 0.6 0.6 1 0.7 0.8 0.9
CU 0.7 0.7 0.7 0.7 1 0.8 0.9
CO 0.8 0.8 0.8 0.8 0.8 1 0.9
EE 0.9 0.9 0.9 0.9 0.9 0.9 1
;
run;
proc iml;
use mydata;
read all var {ID 'Country' 'Value'};
close;
use supplement;
read all var _NUM_ into R[colname=varNames];
close;
print R[colname=varNames rowname=varNames];
idx = loc(ID=1);
c = Country[idx];
v = Value[idx];
print c v;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Rick,

Thanks for your reply. You are right, that was a stupid mistake from me. Of course the matrice should be symmetic.

ID 1 has three countries, CH, ES and EE. Like you said, I need the sum of pairwise product with the weight from the second matrix, which means the sum for ID 1 will be W_{CH, ES}*Value_CH*Value_ES + W_{CH_EE}*Value_CH*Value_EE+W_{ES, EE}*Value_ES*Value_EE. And analogously, for ID 2 the sum will be W_{BR, CO}*Value_BR*Value_CO+W_{BR,CH}*Value_BR*Value_CH+W_{CO, CH}*Value_CO*Value_CH.

Each ID has the same number of countries, but they could be totally different and their order could be totally different as well. The second data contains all possible correlations between the countries in mydata and works like a look up table.

Many thanks again!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

exactly.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Rick,

That would be so great!

This Kind of fomula is not that seldom in risk Management (e.g. variance-covariance method) with their root in stochastic (calculating the variance of the sum of random variables). I am using it to calculate a Kind of modified Herfindahl-Hirschmann index (concentration measure).

Looking Forward to your solution!

Best

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I am assuming that the data are sorted by the ID variable and that the order of the observations in the MyData set match the order of the variables and observations in the Supplement matrix. The main techniques you need to read about and learn are

1. The LOC-ELEMENT technique to find the variables that are in each ID group.

2. Use the ALLCOMB function to generate all pairwise combinations of the variables

3. Convert subscripts [i,j] into an index

4. The UNIQUEBY-LOC technique to loop over the values of the ID variable

Hopefully the comments in the program will adequately describe each step.

```
DATA mydata ;
INPUT ID
Country $
value ;
CARDS ;
1 CH 24
1 ES 55
1 EE 20
2 BR 10
2 CO 50
2 CH 10
3 CH 24
3 BR 10
3 ES 55
3 CO 50
;
data supplement;
input Country $ CH DE BR ES CU CO EE;
cards;
CH 1 0.1 0.2 0.4 0.5 0.6 0.7
DE 0.1 1 0.5 0.6 0.7 0.8 0.9
BR 0.2 0.5 1 0.1 0.2 0.3 0.4
ES 0.4 0.6 0.1 1 0.4 0.5 0.6
CU 0.5 0.7 0.2 0.4 1 0.7 0.8
CO 0.6 0.8 0.3 0.5 0.7 1 0.9
EE 0.7 0.9 0.4 0.6 0.8 0.9 1
;
/* assume MyData is sorted by ID and that the countries for
each ID are the same order as in the Supplement matrix
*/
proc iml;
use mydata;
read all var {ID 'Country' 'Value'};
close;
use supplement;
read all var _NUM_ into R[colname=varNames];
close;
/***************************************************/
/* Demonstrate the computation for ID=1 */
idx = loc(ID=1);
c = Country[idx]; /* countries for ID=1 */
v = Value[idx]; /* values for ID=1 */
print c v;
/* which variables are being asked for?
LOC-ELEMENT technique:
https://blogs.sas.com/content/iml/2015/05/11/loc-element-trick.html
*/
varIdx = loc( element(varNames, c) );
print varIdx;
/* get all pairwise combinations:
https://blogs.sas.com/content/iml/2013/09/30/generate-combinations-in-sas.html
*/
pairs = allcomb(ncol(varIdx), 2);
varPairs = shape( varIdx[pairs], 0, 2);
print pairs varPairs;
/* convert subscripts to indices:
https://blogs.sas.com/content/iml/2011/02/16/converting-matrix-subscripts-to-indices.html
*/
ndx = sub2ndx(dimension(R), varPairs);
w = R[ndx];
x = v[pairs[,1]];
y = v[pairs[,2]];
print w x y;
wsum = sum( w # x # y );
print wsum;
/*****************************************************/
/***************************************************/
/* Now the real thing: compute for all IDs. Assume sorted by ID */
/***************************************************/
/* For UNIQUEBY-LOC trick, see
https://blogs.sas.com/content/iml/2011/11/07/an-efficient-alternative-to-the-unique-loc-technique.html
*/
b = uniqueby(ID); /* b[i] = beginning of i_th category */
labl = char(ID[b]);
wsum = j(nrow(b),1); /* 3. Allocate vector to hold results */
b = b // (nrow(ID)+1); /* trick: append (n+1) to end of b */
do i = 1 to nrow(b)-1; /* 4. For each level... */
idx = b[i]:(b[i+1]-1); /* 5. Find observations in level */
/* 6. Compute statistic on those values */
c = Country[idx]; /* countries for ID=1 */
v = Value[idx]; /* values for ID=1 */
varIdx = loc( element(varNames, c) );
pairs = allcomb(ncol(varIdx), 2);
varPairs = shape( varIdx[pairs], 0, 2);
ndx = sub2ndx(dimension(R), varPairs);
w = R[ndx];
x = v[pairs[,1]];
y = v[pairs[,2]];
wsum[i] = sum( w # x # y );
end;
print wsum[rowname=labl];
```

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.