BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PGStats
Opal | Level 21

Rick, what you describe as statiscally equivalent to "PG's method" in your second paragraph is actually SD's method. PG's method differs in that it does a random assignment to groups within quantiles of the distribution. Thus, similar values of Y are assigned uniformly among the groups. I did the tests above to explore the difference between local (PG) and global (SD) shuffling.

PG

PG
SteveDenham
Jade | Level 19

Key to note that the 'global' method I used did NOT sort before assigning subgroups, so Rick's comment that the two are equivalent should stand, if you sort on U to begin with.

In any case, I learned a lot in this thread.  I hope the OP did as well.

Steve Denham

SteveDenham
Jade | Level 19

And after a bit of rest and time, I finally realized what PGStat's method was--a block randomization.  Assign observations to a block based on some value (ranking phase), and randomize (permutation phase) within the block.  This will almost always lead to more homogeneity within block, and hence the entire schema, when examined over all blocks.

Steve Denham

PTD_SAS
Obsidian | Level 7

I'm very glad that I posted my initial question, I learned a lot!

I used to do just randomisation to assign test groups (for DOE) but that not always resulted in groups with similar averages or variance for specific variables. PG's method works very well, I tested it on various datasets with real-time process data.

Thanks to all of you!

Fethon

PGStats
Opal | Level 21

I posted a that explains the test above and also includes yet another assignment method that gives near perfect balance, plus a couple of references. Following Steve's comment on a proper name for the method that I proposed, I could search further on the net and find that the topic is definitely not a recent one and gets a lot more complicated when one tries to balance many factors at the same time.

PG

PG
PTD_SAS
Obsidian | Level 7

PG,

Great work.

Fethon

Rick_SAS
SAS Super FREQ

If you have SAS/QC software, the OPTEX procedure has many ways to solve this problem.  One way actually solves an optimization problem that attempts to get the means of the groups equal. You can also get higher-order moments (variance, skewness,...) equal:

data Groups;

  do subgroup = 1 to 8;               

     output;

  end;

run;

proc optex data=Groups seed=1234 coding=orthcan;

   class subgroup;

   model subgroup;

   blocks design=N; /* contains the data in Y */

   model Y; /* include Y*Y if you also want StdDevs equal */

   output out=Assignment;

run;

proc means data=Assignment N mean std;

   class subgroup;

   var Y;

run;

PGStats
Opal | Level 21

Great find! I wish I could give it a try on our 2000 obs dataset. But no QC here... who needs that stuff!?Smiley Happy

PG
Rick_SAS
SAS Super FREQ

Because it is solving an optimization problem PRCO OPTEX isn't as speedy as the block-permutation method. On my PC it takes about 20 seconds. I think that time is dependent on the number of subgroups, since placing the data into four subgroups only takes  12 seconds.  Here's the output. All the 99.9999s that you see mean that OPTEX is "very happy" with the results.

                                     The OPTEX Procedure

                                    Class Level Information

                               Class    Levels    ----Values-----

                              subgroup       8    1 2 3 4 5 6 7 8

                            Design      Treatment        Treatment

                            Number     D-Efficiency     A-Efficiency

                            ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

                                 1        99.9999          99.9999

                                 2        99.9999          99.9999

                                 3        99.9999          99.9999

                                 4        99.9999          99.9999

                                 5        99.9998          99.9998

                                 6        99.9998          99.9998

                                 7        99.9998          99.9998

                                 8        99.9998          99.9998

                                 9        99.9998          99.9998

                                10        99.9997          99.9997

                                      The MEANS Procedure

                                     Analysis Variable : Y

                                     N

                      subgroup     Obs       N            Mean         Std Dev

                  ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

                             1     250     250       0.0334389       0.9929350

                            2     250     250       0.0284869       0.9833907

                            3     250     250       0.0268695       1.0124588

                            4     250     250       0.0311731       1.0515241

                             5     250     250       0.0282518       1.1113798

                             6     250     250       0.0302544       0.9809277

                            7     250     250       0.0270397       1.0239409

                             8     250     250       0.0276480       1.0486831

                  ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

PTD_SAS
Obsidian | Level 7

That's great, thanks!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 24 replies
  • 5870 views
  • 6 likes
  • 4 in conversation