- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all, I have the following two data sets:
Data Set 1 with probabilities
key 1 | key 2 | key 3 | key 4 | key 5 | pr_st1-2 | pr_st1-3 | pr_st1-1 | pr_st2-1 | pr_st2-3 | pr_st2-2 | pr_st3-1 | pr_st3-2 | pr_st3-3 |
France | 2017 | Approach A | Retail | Rating 1 | 0.2 | 0.6 | 0.2 | 0.3 | 0.6 | 0.1 | 0.1 | 0.3 | 0.6 |
France | 2018 | Approach B | Corporate | Rating 2 | 0.1 | 0.5 | 0.4 | 0.2 | 0.6 | 0.2 | 0.2 | 0.3 | 0.5 |
Data Set 2 with initial volumes:
key 1 | key 2 | key 3 | key 4 | key 5 | vol_st1 | vol_st2 | vol_st3 |
France | 2017 | Approach A | Retail | Rating 1 | 20 | 10 | 12 |
France | 2018 | Approach B | Corporate | Rating 2 | 23 | 5 | 9 |
Both data has same keys; here key_set_1 = France 2017 Approach A Retail Rating 1. Dataset 1 has probabilities and dataset 2 has volumes.
The task is, I have to calculate ending volumes of each state.
i.e. 1. create a matrix for each row (with unique keys) in data set 1. So for row 1, it would be:
state1 | state2 | state3 | |
state1 | 0.2 | 0.2 | 0.6 |
state2 | 0.3 | 0.1 | 0.6 |
state3 | 0.1 | 0.3 | 0.6 |
2. create a vector of initial volumes
3. And then multiply the vector in data set 2 with those matrices from data set 1, i.e. multiply the initial volumes with probabilities to get the end volume for each set of unique keys. For this example (for the first row), it would be multiplication of
20 | 10 | 12 |
with
0.2 | 0.2 | 0.6 |
0.3 | 0.1 | 0.6 |
0.1 | 0.3 | 0.6 |
=
8.2 | 8.6 | 25.2 |
Thanks to the helpful folks from this forum, I have been able to partially achieve the first step, i.e. create the matrix for each row of data set 1. But, I am not sure how can I store the keys in an array and and store the outputs also corresponding to the same keys.
in the end result we should be able to identify input 1 (3*3 matrix), input 2 (1*3 vector) and output (1*3 vector) with key_set_1. same for key_set_2, Key_set_3 and so on.
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Year (country method Segment) ( : $12.)
pr_st1_2 pr_st1_3 pr_st1_1 pr_st2_1 pr_st2_3 pr_st2_2 pr_st3_1 pr_st3_2 pr_st3_3;
datalines;
2017 USA ABC Retail 0.2 0.6 0.2 0.3 0.6 0.1 0.1 0.3 0.6
2017 USA XYZ Corporate 0.1 0.5 0.4 0.2 0.6 0.2 0.2 0.3 0.5
;
run;
data key;
input country $ Year method $ Segment : $12. vol_st1 vol_st2 vol_st3 ;
cards;
USA 2017 ABC Retail 20 10 12
USA 2017 XYZ Corporate 23 5 9
;
run;
data have;
merge have key;
by Year country method Segment ;
run;
proc sql noprint;
select name into : vname separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE' and lowcase(name) eqt 'pr_st'
order by input(compress(scan(name,-2,'_'),,'kd'),best.),input(compress(scan(name,-1,'_'),,'kd'),best.);
quit;
proc iml;
use have nobs nobs;
read all var {&vname} into x;
read all var {vol_st1 vol_st2 vol_st3} into y;
close;
do i=1 to nobs;
temp=shape(x[i,],sqrt(ncol(x)));
want=(temp#t(y[i,]))[+,];
print want;
end;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Year (country method Segment) ( : $12.)
pr_st1_2 pr_st1_3 pr_st1_1 pr_st2_1 pr_st2_3 pr_st2_2 pr_st3_1 pr_st3_2 pr_st3_3;
datalines;
2017 USA ABC Retail 0.2 0.6 0.2 0.3 0.6 0.1 0.1 0.3 0.6
2017 USA XYZ Corporate 0.1 0.5 0.4 0.2 0.6 0.2 0.2 0.3 0.5
;
run;
data key;
input country $ Year method $ Segment : $12. vol_st1 vol_st2 vol_st3 ;
cards;
USA 2017 ABC Retail 20 10 12
USA 2017 XYZ Corporate 23 5 9
;
run;
data have;
merge have key;
by Year country method Segment ;
run;
proc sql noprint;
select name into : vname separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE' and lowcase(name) eqt 'pr_st'
order by input(compress(scan(name,-2,'_'),,'kd'),best.),input(compress(scan(name,-1,'_'),,'kd'),best.);
quit;
proc iml;
use have nobs nobs;
read all var {&vname} into x;
read all var {vol_st1 vol_st2 vol_st3} into y;
close;
do i=1 to nobs;
temp=shape(x[i,],sqrt(ncol(x)));
want=(temp#t(y[i,]))[+,];
print want;
end;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
want = y[i,]*temp;
would be more simple.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot @Ksharp. Assuming the files would be big (500 different keys - 500 different rows), how do I get back the result for a particular set of key?
For example, if I run this piece of code it gives me two want matrices for two rows in the data, which is perfect. But if there were 500 rows and I wanted to extract the one with
2017 USA XYZ Corporate
, i.e. want[2], what could I have done?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think it would be best to create a single matrix large enough to hold all 500 results. Then the matrix is filled up one row at a time for each iteration of the loop. Afterwards results can be extracted by row sub setting. The syntax would look like this:
n = sqrt(ncol(x));
want = j(nobs, n);
do i = 1 to nobs;
temp = shape(x[i,], n);
want[i,] = y[i,]*temp;
end;
w2 = want[2,];
print w2;