Solved: Re: Reordering Observations Non-systematically (Specified Indexing)

dmz2000 · Posted 10-27-2015 03:55 PM

Thanks for the advices in advance. Is there a proc that reorders observations according to a specified order? The situation I encountered is as follows:

I was given a dataset in the format of the below, with scenario results for four groups across the columns. In rows are simulated scenarios. So essentially results are by group by scenario.

Scenario Set	Group 1	Group 2	Group 3	Group 4
A	124	100	79	79
B	100	85	109	82
C	128	115	100	102
D	94	73	100	100
E	79	124	76	97
F	100	123	104	121
G	128	100	87	126
H	111	100	100	83
I	100	122	86	100
J	111	100	117	91

However, each group is evaluated by a different system and each system has reordered the results (from A, B... C...) according to some algorithm, so the first result for Group 1 is actually not from the same simulation scenario as Group 2. Provided to me also with the results is a reordering mapping that looks like:

Scenario Set	Group 1	Group 2	Group 3	Group 4
A	1	9	2	8
B	2	2	1	5
C	3	10	3	6
D	4	6	4	4
E	5	5	10	3
F	6	4	8	9
G	7	7	7	10
H	8	3	6	7
I	9	1	9	2
J	10	8	5	1

Which means that, for the first scenario set A, obs 1 from Group 1 (124), obs 9 from Group 2 (122), obs 2 from Group 3 (109) and obs 8 from Group 4 (83) should be reordered as such to form an actual coherent group (where Scenario Set A would be 124, 122, 109 and 83).

Assuming that the data will be provided to me this way, with the scenario results set and a scenarios reordering map, is there any SAS proc that allows reordering according to a pre-speficied order as shown in the map? Thanks.

Astounding · Posted 10-28-2015 04:58 PM

OK, let's give this a shot. Using 30000 as the number of subjects:

data want;

set reordering_map (rename=(group1=_1_ group2=_2_ group3=_3_ group4=_4_);

do _A_=1 to 30000;

set results (keep=subject) point=_A_;

record_num = _A_ + 30000 * (_1_ - 1);

set results (keep=group1) point=record_num;

record_num = _A_ + 30000 * (_2_ - 1);

set results (keep=group2) point=record_num;

record_num = _A_ + 30000 * (_3_ - 1);

set results (keep=group3) point=record_num;

record_num = _A_ + 30000 * (_4_ - 1);

set results (keep=group4 point=record_num;

output;

end;

run;

It's untested, so I assume you'll give it a try on a relatively small data set.

View solution in original post

Astounding · Posted 10-27-2015 04:19 PM

Cute problem ... the right DATA step should do the trick.

data want;

set reordering_map;

record_num=group1;

set results (keep=group1) point=record_num;

record_num=group2;

set results (keep=group2) point=record_num;

record_num=group3;

set results (keep=group3) point=record_num;

record_num=group4;

set results (keep=group4) point=record_num;

run;

I'm pretty sure SAS will drop RECORD_NUM automatically, but you'll find out. (It's also untested code.)

Good luck.

dmz2000 · Posted 10-28-2015 01:39 PM

Thanks. The data step suggestion above with the point= option works for the situation I gave.

The actual results data has one more level of slight difficulty. There are multiple subjects (a fixed number across all scenarios, and the same subjects) that participate in all Groups, and are represented in all scenarios. The map file would be the same as above.

So the indexing would require that we point to chunks of positions instead of a single index position per scenario. How can the solution be tweaked to accomodate the situation?

Scenario Set	Subject	Group 1	Group 2	Group 3	Group 4
A	tuv	37	30	24	24
A	wyx	25	20	16	16
A	zab	62	50	40	40
B	tuv	30	26	33	25
B	wyx	20	17	22	16
B	zab	50	43	55	41
C	tuv	38	35	30	31
C	wyx	26	23	20	20
C	zab	64	58	50	51
D	tuv	28	22	30	30
D	wyx	19	15	20	20
D	zab	47	37	50	50
E	tuv	24	37	23	29
E	wyx	16	25	15	19
E	zab	40	62	38	49
F	tuv	30	37	31	36
F	wyx	20	25	21	24
F	zab	50	62	52	61
…	…	…	…	…	…

Astounding · Posted 10-28-2015 01:46 PM

You'll have to give a sample mapping data set (just a couple of lines would be enough), and show what the results should be.

dmz2000 · Posted 10-28-2015 03:09 PM

The mapping dataset would be the same as before.

Take scenario set B for example, the corresponding mapping would be [2, 2, 1, 5]. The expected result should take the obs group 2 [B] results set from Group 1, obs group 2 [B] from Group 2, obs group 1 [A] from Group 3 and obs group 5 [E] from Group 4. I have highlighted these results groups in red above.

The expected post-mapped new results data set would look like the follows, for set B:

Scenario Set	Subject	Group 1	Group 2	Group 3	Group 4
B	tuv	30	26	24	29
B	wyx	20	17	16	19
B	zab	50	43	40	49

Astounding · Posted 10-28-2015 03:36 PM

OK, just to confirm my expectations, then ...

Each subject appears for every scenario.

The number of subjects is known, and can be hard-coded within the program. (We can get around this if needed.)

dmz2000 · Posted 10-28-2015 03:37 PM

Yes the same subjects are repeated across scenarios. The subject count is known and can be hardcoded if needed. The actual dataset is about 10,000 scenarios, with 30,000 (same) subjects in each.

Thanks.

Astounding · Posted 10-28-2015 03:45 PM

Sorry, one more question to simplify ...

Are the subjects always in the same order? (The first subject in Scenario A will always be the same person as the first subject in Scenario B?)

dmz2000 · Posted 10-28-2015 03:46 PM

Yes I can first sort them by scenario then by subject to ensure that being the case.

Astounding · Posted 10-28-2015 04:58 PM

OK, let's give this a shot. Using 30000 as the number of subjects:

data want;

set reordering_map (rename=(group1=_1_ group2=_2_ group3=_3_ group4=_4_);

do _A_=1 to 30000;

set results (keep=subject) point=_A_;

record_num = _A_ + 30000 * (_1_ - 1);

set results (keep=group1) point=record_num;

record_num = _A_ + 30000 * (_2_ - 1);

set results (keep=group2) point=record_num;

record_num = _A_ + 30000 * (_3_ - 1);

set results (keep=group3) point=record_num;

record_num = _A_ + 30000 * (_4_ - 1);

set results (keep=group4 point=record_num;

output;

end;

run;

It's untested, so I assume you'll give it a try on a relatively small data set.

dmz2000 · Posted 10-29-2015 03:57 PM

Tested on the sample dataset and this works! Thank you. One question is - how can I tweak the above to keep the Scenario_Set column in the results as well? I added Scenario_Set to (keep=subject) but it seems to repeat the first data point all through out.

Astounding · Posted 10-29-2015 08:05 PM

It should be coming in from the REORDERING_MAP data set. Is SCENARIO_SET not part of that data set?

Do any of the SET RESULTS statements also keep SCENARIO_SET (by mistake)?

dmz2000 · Posted 10-30-2015 11:46 AM

It's on the dataset. Nevermind. Thank you for all the help!

LinusH · Posted 10-28-2015 12:44 PM

Normalizing your data structure will make it more sustainable IMO.

Transpose both tables so that the composite unique key will be scenario set together with Group no.
Then just join the tables.

Data never sleeps

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!