Thanks for the advices in advance. Is there a proc that reorders observations according to a specified order? The situation I encountered is as follows:
I was given a dataset in the format of the below, with scenario results for four groups across the columns. In rows are simulated scenarios. So essentially results are by group by scenario.
Scenario Set | Group 1 | Group 2 | Group 3 | Group 4 |
A | 124 | 100 | 79 | 79 |
B | 100 | 85 | 109 | 82 |
C | 128 | 115 | 100 | 102 |
D | 94 | 73 | 100 | 100 |
E | 79 | 124 | 76 | 97 |
F | 100 | 123 | 104 | 121 |
G | 128 | 100 | 87 | 126 |
H | 111 | 100 | 100 | 83 |
I | 100 | 122 | 86 | 100 |
J | 111 | 100 | 117 | 91 |
However, each group is evaluated by a different system and each system has reordered the results (from A, B... C...) according to some algorithm, so the first result for Group 1 is actually not from the same simulation scenario as Group 2. Provided to me also with the results is a reordering mapping that looks like:
Scenario Set | Group 1 | Group 2 | Group 3 | Group 4 |
A | 1 | 9 | 2 | 8 |
B | 2 | 2 | 1 | 5 |
C | 3 | 10 | 3 | 6 |
D | 4 | 6 | 4 | 4 |
E | 5 | 5 | 10 | 3 |
F | 6 | 4 | 8 | 9 |
G | 7 | 7 | 7 | 10 |
H | 8 | 3 | 6 | 7 |
I | 9 | 1 | 9 | 2 |
J | 10 | 8 | 5 | 1 |
Which means that, for the first scenario set A, obs 1 from Group 1 (124), obs 9 from Group 2 (122), obs 2 from Group 3 (109) and obs 8 from Group 4 (83) should be reordered as such to form an actual coherent group (where Scenario Set A would be 124, 122, 109 and 83).
Assuming that the data will be provided to me this way, with the scenario results set and a scenarios reordering map, is there any SAS proc that allows reordering according to a pre-speficied order as shown in the map? Thanks.
OK, let's give this a shot. Using 30000 as the number of subjects:
data want;
set reordering_map (rename=(group1=_1_ group2=_2_ group3=_3_ group4=_4_);
do _A_=1 to 30000;
set results (keep=subject) point=_A_;
record_num = _A_ + 30000 * (_1_ - 1);
set results (keep=group1) point=record_num;
record_num = _A_ + 30000 * (_2_ - 1);
set results (keep=group2) point=record_num;
record_num = _A_ + 30000 * (_3_ - 1);
set results (keep=group3) point=record_num;
record_num = _A_ + 30000 * (_4_ - 1);
set results (keep=group4 point=record_num;
output;
end;
run;
It's untested, so I assume you'll give it a try on a relatively small data set.
Cute problem ... the right DATA step should do the trick.
data want;
set reordering_map;
record_num=group1;
set results (keep=group1) point=record_num;
record_num=group2;
set results (keep=group2) point=record_num;
record_num=group3;
set results (keep=group3) point=record_num;
record_num=group4;
set results (keep=group4) point=record_num;
run;
I'm pretty sure SAS will drop RECORD_NUM automatically, but you'll find out. (It's also untested code.)
Good luck.
Thanks. The data step suggestion above with the point= option works for the situation I gave.
The actual results data has one more level of slight difficulty. There are multiple subjects (a fixed number across all scenarios, and the same subjects) that participate in all Groups, and are represented in all scenarios. The map file would be the same as above.
So the indexing would require that we point to chunks of positions instead of a single index position per scenario. How can the solution be tweaked to accomodate the situation?
Scenario Set | Subject | Group 1 | Group 2 | Group 3 | Group 4 |
A | tuv | 37 | 30 | 24 | 24 |
A | wyx | 25 | 20 | 16 | 16 |
A | zab | 62 | 50 | 40 | 40 |
B | tuv | 30 | 26 | 33 | 25 |
B | wyx | 20 | 17 | 22 | 16 |
B | zab | 50 | 43 | 55 | 41 |
C | tuv | 38 | 35 | 30 | 31 |
C | wyx | 26 | 23 | 20 | 20 |
C | zab | 64 | 58 | 50 | 51 |
D | tuv | 28 | 22 | 30 | 30 |
D | wyx | 19 | 15 | 20 | 20 |
D | zab | 47 | 37 | 50 | 50 |
E | tuv | 24 | 37 | 23 | 29 |
E | wyx | 16 | 25 | 15 | 19 |
E | zab | 40 | 62 | 38 | 49 |
F | tuv | 30 | 37 | 31 | 36 |
F | wyx | 20 | 25 | 21 | 24 |
F | zab | 50 | 62 | 52 | 61 |
… | … | … | … | … | … |
You'll have to give a sample mapping data set (just a couple of lines would be enough), and show what the results should be.
The mapping dataset would be the same as before.
Take scenario set B for example, the corresponding mapping would be [2, 2, 1, 5]. The expected result should take the obs group 2 [B] results set from Group 1, obs group 2 [B] from Group 2, obs group 1 [A] from Group 3 and obs group 5 [E] from Group 4. I have highlighted these results groups in red above.
The expected post-mapped new results data set would look like the follows, for set B:
Scenario Set | Subject | Group 1 | Group 2 | Group 3 | Group 4 |
B | tuv | 30 | 26 | 24 | 29 |
B | wyx | 20 | 17 | 16 | 19 |
B | zab | 50 | 43 | 40 | 49 |
OK, just to confirm my expectations, then ...
Each subject appears for every scenario.
The number of subjects is known, and can be hard-coded within the program. (We can get around this if needed.)
Yes the same subjects are repeated across scenarios. The subject count is known and can be hardcoded if needed. The actual dataset is about 10,000 scenarios, with 30,000 (same) subjects in each.
Thanks.
Sorry, one more question to simplify ...
Are the subjects always in the same order? (The first subject in Scenario A will always be the same person as the first subject in Scenario B?)
Yes I can first sort them by scenario then by subject to ensure that being the case.
OK, let's give this a shot. Using 30000 as the number of subjects:
data want;
set reordering_map (rename=(group1=_1_ group2=_2_ group3=_3_ group4=_4_);
do _A_=1 to 30000;
set results (keep=subject) point=_A_;
record_num = _A_ + 30000 * (_1_ - 1);
set results (keep=group1) point=record_num;
record_num = _A_ + 30000 * (_2_ - 1);
set results (keep=group2) point=record_num;
record_num = _A_ + 30000 * (_3_ - 1);
set results (keep=group3) point=record_num;
record_num = _A_ + 30000 * (_4_ - 1);
set results (keep=group4 point=record_num;
output;
end;
run;
It's untested, so I assume you'll give it a try on a relatively small data set.
Tested on the sample dataset and this works! Thank you. One question is - how can I tweak the above to keep the Scenario_Set column in the results as well? I added Scenario_Set to (keep=subject) but it seems to repeat the first data point all through out.
It should be coming in from the REORDERING_MAP data set. Is SCENARIO_SET not part of that data set?
Do any of the SET RESULTS statements also keep SCENARIO_SET (by mistake)?
It's on the dataset. Nevermind. Thank you for all the help!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.