BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dmz2000
Calcite | Level 5

Thanks for the advices in advance. Is there a proc that reorders observations according to a specified order? The situation I encountered is as follows:

 

I was given a dataset in the format of the below, with scenario results for four groups across the columns. In rows are simulated scenarios. So essentially results are by group by scenario.

Scenario Set Group 1 Group 2 Group 3 Group 4
A 124 100 79 79
B 100 85 109 82
C 128 115 100 102
D 94 73 100 100
E 79 124 76 97
F 100 123 104 121
G 128 100 87 126
H 111 100 100 83
I 100 122 86 100
J 111 100 117 91

 

However, each group is evaluated by a different system and each system has reordered the results (from A, B... C...) according to some algorithm, so the first result for Group 1 is actually not from the same simulation scenario as Group 2. Provided to me also with the results is a reordering mapping that looks like:

Scenario Set Group 1 Group 2 Group 3 Group 4
A 1 9 2 8
B 2 2 1 5
C 3 10 3 6
D 4 6 4 4
E 5 5 10 3
F 6 4 8 9
G 7 7 7 10
H 8 3 6 7
I 9 1 9 2
J 10 8 5 1

 

Which means that, for the first scenario set A, obs 1 from Group 1 (124), obs 9 from Group 2 (122), obs 2 from Group 3 (109) and obs 8 from Group 4 (83) should be reordered as such to form an actual coherent group (where Scenario Set A would be 124, 122, 109 and 83).

 

Assuming that the data will be provided to me this way, with the scenario results set and a scenarios reordering map, is there any SAS proc that allows reordering according to a pre-speficied order as shown in the map? Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

OK, let's give this a shot.  Using 30000 as the number of subjects:

 

data want;

  set reordering_map (rename=(group1=_1_  group2=_2_  group3=_3_  group4=_4_);

  do _A_=1 to 30000;

      set results (keep=subject) point=_A_;

      record_num = _A_ + 30000 * (_1_ - 1);

      set results (keep=group1) point=record_num;

      record_num = _A_ + 30000 * (_2_ - 1);

      set results (keep=group2) point=record_num;

      record_num = _A_ + 30000 * (_3_ - 1);

      set results (keep=group3) point=record_num;

      record_num = _A_ + 30000 * (_4_ - 1);

      set results (keep=group4 point=record_num;

      output;

   end;

run;

 

It's untested, so I assume you'll give it a try on a relatively small data set.

 

 

View solution in original post

13 REPLIES 13
Astounding
PROC Star

Cute problem ... the right DATA step should do the trick.

 

data want;

   set reordering_map;

   record_num=group1;

   set results (keep=group1) point=record_num;

   record_num=group2;

   set results (keep=group2) point=record_num;

   record_num=group3;

   set results (keep=group3) point=record_num;

   record_num=group4;

   set results (keep=group4) point=record_num;

run;

 

I'm pretty sure SAS will drop RECORD_NUM automatically, but you'll find out.  (It's also untested code.) 

 

Good luck.

dmz2000
Calcite | Level 5

Thanks. The data step suggestion above with the point= option works for the situation I gave.

 

The actual results data has one more level of slight difficulty. There are multiple subjects (a fixed number across all scenarios, and the same subjects) that participate in all Groups, and are represented in all scenarios. The map file would be the same as above.

 

So the indexing would require that we point to chunks of positions instead of a single index position per scenario. How can the solution be tweaked to accomodate the situation?

 

Scenario Set Subject Group 1 Group 2 Group 3 Group 4
A tuv 37 30 24 24
A wyx 25 20 16 16
A zab 62 50 40 40
B tuv 30 26 33 25
B wyx 20 17 22 16
B zab 50 43 55 41
C tuv 38 35 30 31
C wyx 26 23 20 20
C zab 64 58 50 51
D tuv 28 22 30 30
D wyx 19 15 20 20
D zab 47 37 50 50
E tuv 24 37 23 29
E wyx 16 25 15 19
E zab 40 62 38 49
F tuv 30 37 31 36
F wyx 20 25 21 24
F zab 50 62 52 61
Astounding
PROC Star

You'll have to give a sample mapping data set (just a couple of lines would be enough), and show what the results should be.

dmz2000
Calcite | Level 5

The mapping dataset would be the same as before.

 

Take scenario set B for example, the corresponding mapping would be [2, 2, 1, 5]. The expected result should take the obs group 2 [B] results set from Group 1, obs group 2 [B] from Group 2, obs group 1 [A] from Group 3 and obs group 5 [E] from Group 4. I have highlighted these results groups in red above.

 

The expected post-mapped new results data set would look like the follows, for set B:

Scenario Set Subject Group 1 Group 2 Group 3 Group 4
B tuv 30 26 24 29
B wyx 20 17 16 19
B zab 50 43 40 49
Astounding
PROC Star

OK, just to confirm my expectations, then ...

 

Each subject appears for every scenario.

 

The number of subjects is known, and can be hard-coded within the program.  (We can get around this if needed.)

dmz2000
Calcite | Level 5

Yes the same subjects are repeated across scenarios. The subject count is known and can be hardcoded if needed. The actual dataset is about 10,000 scenarios, with 30,000 (same) subjects in each.

 

Thanks.

 

 

Astounding
PROC Star

Sorry, one more question to simplify ...

 

Are the subjects always in the same order?  (The first subject in Scenario A will always be the same person as the first subject in Scenario B?)

dmz2000
Calcite | Level 5

Yes I can first sort them by scenario then by subject to ensure that being the case.

Astounding
PROC Star

OK, let's give this a shot.  Using 30000 as the number of subjects:

 

data want;

  set reordering_map (rename=(group1=_1_  group2=_2_  group3=_3_  group4=_4_);

  do _A_=1 to 30000;

      set results (keep=subject) point=_A_;

      record_num = _A_ + 30000 * (_1_ - 1);

      set results (keep=group1) point=record_num;

      record_num = _A_ + 30000 * (_2_ - 1);

      set results (keep=group2) point=record_num;

      record_num = _A_ + 30000 * (_3_ - 1);

      set results (keep=group3) point=record_num;

      record_num = _A_ + 30000 * (_4_ - 1);

      set results (keep=group4 point=record_num;

      output;

   end;

run;

 

It's untested, so I assume you'll give it a try on a relatively small data set.

 

 

dmz2000
Calcite | Level 5

Tested on the sample dataset and this works! Thank you. One question is - how can I tweak the above to keep the Scenario_Set column in the results as well? I added Scenario_Set to (keep=subject) but it seems to repeat the first data point all through out.

Astounding
PROC Star

It should be coming in from the REORDERING_MAP data set.  Is SCENARIO_SET not part of that data set?

 

Do any of the SET RESULTS statements also keep SCENARIO_SET (by mistake)?

dmz2000
Calcite | Level 5

It's on the dataset. Nevermind. Thank you for all the help!

LinusH
Tourmaline | Level 20
Normalizing your data structure will make it more sustainable IMO.

Transpose both tables so that the composite unique key will be scenario set together with Group no.
Then just join the tables.
Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1633 views
  • 2 likes
  • 3 in conversation