<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Grouping data in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41359#M1747</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Here is a data step method, using Rick's data.&amp;nbsp; Note that the group sizes are not identical, ranging from 219 to 283&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="background-color: white; font-family: 'Courier New'; color: navy; font-size: 14pt;"&gt;data&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; N;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;call&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; streaminit(&lt;/SPAN&gt;&lt;STRONG style="background-color: white; font-family: 'Courier New'; color: teal; font-size: 14pt;"&gt;1&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;do&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; i = &lt;/SPAN&gt;&lt;STRONG style="background-color: white; font-family: 'Courier New'; color: teal; font-size: 14pt;"&gt;1&lt;/STRONG&gt; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;to&lt;/SPAN&gt; &lt;STRONG style="background-color: white; font-family: 'Courier New'; color: teal; font-size: 14pt;"&gt;2000&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;&amp;nbsp;&amp;nbsp; Y = rand(&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: purple; font-size: 14pt;"&gt;"Normal"&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;&amp;nbsp;&amp;nbsp; Y2 = rand("Uniform");&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;output&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;end&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="background-color: white; font-family: 'Courier New'; color: navy; font-size: 14pt;"&gt;run&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;data N2;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;&amp;nbsp; set N;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;&amp;nbsp; subgroup=ceil(Y2*8);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;run;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;proc glm data=N2;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;class subgroup;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;model Y=subgroup;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;means subgroup/hovtest=bf;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;quit;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: arial,helvetica,sans-serif; color: black; font-size: 10pt;"&gt;The call to GLM tests for differences in means and variances between the 8 groups.&amp;nbsp; Getting the groups to exactly the same size is going to be difficult, but I am pretty sure a macro implementing PROC SURVEYSELECT exists for just that purpose.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: arial,helvetica,sans-serif; color: black; font-size: 10pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: arial,helvetica,sans-serif; color: black; font-size: 10pt;"&gt;Steve Denham&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: arial,helvetica,sans-serif; color: black; font-size: 10pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: arial,helvetica,sans-serif; color: black; font-size: 10pt;"&gt;Hope this helps.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 23 Mar 2012 17:12:55 GMT</pubDate>
    <dc:creator>SteveDenham</dc:creator>
    <dc:date>2012-03-23T17:12:55Z</dc:date>
    <item>
      <title>Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41353#M1741</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;How can I create subgroups in a large dataset so that all subgroups are approximately equal as measured by a specific variable? E.g to create X number of subgroups that have the same average for the variable Y. Would there be a clustering procedure to do that?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 22 Mar 2012 23:09:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41353#M1741</guid>
      <dc:creator>PTD_SAS</dc:creator>
      <dc:date>2012-03-22T23:09:25Z</dc:date>
    </item>
    <item>
      <title>Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41354#M1742</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; Some questions first:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How big is the large dataset (i.e. how many observations)?&lt;/P&gt;&lt;P&gt;How many subgroups?&amp;nbsp; &lt;/P&gt;&lt;P&gt;What does the distribution of the variable Y look like (i.e., normal, nearly normal, skewed, horribly skewed, etc.)?&lt;/P&gt;&lt;P&gt;How close do the means of the subgroups have to be?&amp;nbsp; Identical will be impossible, so some acceptable interval needs to be specified.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am thinking of PROC SURVEYSELECT as a starting tool, but that will require knowing how many subgroups you want.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good luck.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 23 Mar 2012 11:10:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41354#M1742</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2012-03-23T11:10:23Z</dc:date>
    </item>
    <item>
      <title>Re: Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41355#M1743</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;As Steve says, it isn't clear what you want. One interpretation of your request is that you want to split Y into quantiles: if you want 10 groups use deciles of Y, if you want 4 groups, use quartiles, etc.&amp;nbsp; Each group has about the same number of observations, and similar values of Y (where "similar" doesn't necessarily mean the same, especially for the upper and lower quantiles.)&amp;nbsp; If you want to do this, you can use PROC RANK, which contains &lt;A href="http://support.sas.com/documentation/cdl/en/proc/63079/HTML/default/viewer.htm#p1xzpoijq32wbsn1gr6g5cx3emsx.htm"&gt;an example of doing this&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you don't care whether there are an equal number of obs in each group, then probably clustering is a good idea.&lt;/P&gt;&lt;P&gt;Rick&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 23 Mar 2012 12:54:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41355#M1743</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2012-03-23T12:54:02Z</dc:date>
    </item>
    <item>
      <title>Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41356#M1744</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have a data set with about 2000 observations to be split into 8 groups that would have similar (not identical) average Y. The variable Y is normally distributed. &lt;/P&gt;&lt;P&gt;I didn't explain clearly my initial question, I think Rick took it as having Y be similar within the groups, whilst I want the group average Ys to be similar. The similarity doesn't need to be too tight, if that helps. &lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Fethon &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 23 Mar 2012 16:38:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41356#M1744</guid>
      <dc:creator>PTD_SAS</dc:creator>
      <dc:date>2012-03-23T16:38:31Z</dc:date>
    </item>
    <item>
      <title>Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41357#M1745</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; I'm not sure why you want 8 groups if Y is normally distributed, but here's how you can do it with k-means clustering:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="color: navy; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;data&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; N;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;call&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; streaminit(&lt;/SPAN&gt;&lt;STRONG style="color: teal; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;1&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;do&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; i = &lt;/SPAN&gt;&lt;STRONG style="color: teal; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;1&lt;/STRONG&gt; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;to&lt;/SPAN&gt; &lt;STRONG style="color: teal; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;2000&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;&amp;nbsp;&amp;nbsp; Y = rand(&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: purple; font-size: 14pt;"&gt;"Normal"&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;output&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;end&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="color: navy; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;run&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="color: navy; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;proc&lt;/STRONG&gt; &lt;STRONG style="color: navy; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;fastclus&lt;/STRONG&gt; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;data&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;=N &lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;maxc&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;=&lt;/SPAN&gt;&lt;STRONG style="color: teal; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;8&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; out=clus;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;var&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; Y;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="color: navy; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;run&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="color: navy; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;proc&lt;/STRONG&gt; &lt;STRONG style="color: navy; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;freq&lt;/STRONG&gt; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;data&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;=clus;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;tables&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; cluster;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="color: navy; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;run&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="color: navy; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;proc&lt;/STRONG&gt; &lt;STRONG style="color: navy; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;sgplot&lt;/STRONG&gt; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;data&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;=clus;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;scatter&lt;/SPAN&gt; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;x&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;=cluster &lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;y&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;=Y /&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;group&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;=Cluster ;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="color: navy; font-size: 14pt; background-color: white; font-family: 'Courier New';"&gt;run&lt;/STRONG&gt;;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 23 Mar 2012 16:52:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41357#M1745</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2012-03-23T16:52:26Z</dc:date>
    </item>
    <item>
      <title>Re: Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41358#M1746</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;For something very simple, you could try using the similarity of adjacent observations in the sorted dataset, with a little permutation :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data sortedY;&lt;BR /&gt;do i = 1 to 2000; &lt;/P&gt;&lt;P&gt;y = rannor(-1); &lt;/P&gt;&lt;P&gt;output; &lt;/P&gt;&lt;P&gt;end;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sort data=sortedY; by y; run;&lt;/P&gt;&lt;P&gt;data grouped(drop = _seed _g1-_g8);&lt;BR /&gt;array _g{8} _g1-_g8;&lt;BR /&gt;retain _g (1 2 3 4 5 6 7 8);&lt;BR /&gt;set sortedY nobs=_nobs;&lt;BR /&gt;if _n_ &amp;lt;= 8*floor(_nobs/8);&lt;BR /&gt;_seed = 8243959;&lt;BR /&gt;if mod(_n_-1, &lt;span class="lia-unicode-emoji" title=":smiling_face_with_sunglasses:"&gt;😎&lt;/span&gt; = 0 then call ranperm(_seed, of _g1-_g8);&lt;BR /&gt;group = _g(1 + mod(_n_-1, 8));&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;select group, mean(y) as meanY from grouped group by group;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PG&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Tweaked by PG.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 23 Mar 2012 17:03:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41358#M1746</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2012-03-23T17:03:51Z</dc:date>
    </item>
    <item>
      <title>Re: Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41359#M1747</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Here is a data step method, using Rick's data.&amp;nbsp; Note that the group sizes are not identical, ranging from 219 to 283&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="background-color: white; font-family: 'Courier New'; color: navy; font-size: 14pt;"&gt;data&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; N;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;call&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; streaminit(&lt;/SPAN&gt;&lt;STRONG style="background-color: white; font-family: 'Courier New'; color: teal; font-size: 14pt;"&gt;1&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;do&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; i = &lt;/SPAN&gt;&lt;STRONG style="background-color: white; font-family: 'Courier New'; color: teal; font-size: 14pt;"&gt;1&lt;/STRONG&gt; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;to&lt;/SPAN&gt; &lt;STRONG style="background-color: white; font-family: 'Courier New'; color: teal; font-size: 14pt;"&gt;2000&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;&amp;nbsp;&amp;nbsp; Y = rand(&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: purple; font-size: 14pt;"&gt;"Normal"&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;&amp;nbsp;&amp;nbsp; Y2 = rand("Uniform");&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;output&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: blue; font-size: 14pt;"&gt;end&lt;/SPAN&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="background-color: white; font-family: 'Courier New'; color: navy; font-size: 14pt;"&gt;run&lt;/STRONG&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;data N2;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;&amp;nbsp; set N;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;&amp;nbsp; subgroup=ceil(Y2*8);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;run;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;proc glm data=N2;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;class subgroup;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;model Y=subgroup;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;means subgroup/hovtest=bf;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt;quit;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: 'Courier New'; color: black; font-size: 14pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: arial,helvetica,sans-serif; color: black; font-size: 10pt;"&gt;The call to GLM tests for differences in means and variances between the 8 groups.&amp;nbsp; Getting the groups to exactly the same size is going to be difficult, but I am pretty sure a macro implementing PROC SURVEYSELECT exists for just that purpose.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: arial,helvetica,sans-serif; color: black; font-size: 10pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: arial,helvetica,sans-serif; color: black; font-size: 10pt;"&gt;Steve Denham&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: arial,helvetica,sans-serif; color: black; font-size: 10pt;"&gt; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: white; font-family: arial,helvetica,sans-serif; color: black; font-size: 10pt;"&gt;Hope this helps.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 23 Mar 2012 17:12:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41359#M1747</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2012-03-23T17:12:55Z</dc:date>
    </item>
    <item>
      <title>Re: Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41360#M1748</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Steve's solution is just a random split into 8 groups. If that's what you wanted, why mention Y at all? I thought you were interested in grouping on Y.&lt;/P&gt;&lt;P&gt;If your data are randomly distributed in the data set, you can also just use&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; subgroup = mod(_N_, 8);&lt;/P&gt;&lt;P&gt;but Steve's method is "safer" in case the data are sorted or autocorrelated.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 23 Mar 2012 17:50:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41360#M1748</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2012-03-23T17:50:07Z</dc:date>
    </item>
    <item>
      <title>Re: Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41361#M1749</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; If you need an exact split, so that all subgroups have equal size, try this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data N;&lt;BR /&gt;call streaminit(1);&lt;BR /&gt;do i = 1 to 2000;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; Y = rand("Normal");&lt;BR /&gt;&amp;nbsp;&amp;nbsp; output;&lt;BR /&gt;end;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;proc plan ;&lt;BR /&gt;&amp;nbsp; factors subgroup=8 ordered rep=250 random;&lt;BR /&gt;&amp;nbsp; output out=sg;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;data sg;&lt;BR /&gt;set sg;&lt;BR /&gt;&amp;nbsp; i=_n_;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;proc sort data=n;&lt;BR /&gt; by i;&lt;BR /&gt;run;&lt;BR /&gt;proc sort data=sg;&lt;BR /&gt; by i;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;data n2;&lt;BR /&gt;merge sg n;&lt;BR /&gt; by i;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;proc glm data=N2;&lt;BR /&gt; class subgroup;&lt;BR /&gt;&amp;nbsp; model Y=subgroup;&lt;BR /&gt;&amp;nbsp; means subgroup/hovtest=bf;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am sure there are slicker ways of getting all the info together, but this is 'file the serial numbers off' of legacy code.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 23 Mar 2012 19:00:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41361#M1749</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2012-03-23T19:00:14Z</dc:date>
    </item>
    <item>
      <title>Re: Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41362#M1750</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; Or from your previous post, sort by Y2 (the random uniform variable) and then use the mod(_N_, &lt;span class="lia-unicode-emoji" title=":smiling_face_with_sunglasses:"&gt;😎&lt;/span&gt; trick.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 23 Mar 2012 19:45:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41362#M1750</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2012-03-23T19:45:19Z</dc:date>
    </item>
    <item>
      <title>Re: Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41363#M1751</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I simulated 100 replicates of 2000 normal variates and compared the mean variability, the variance range and the size of 8 subgroups formed by two methods &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PGperm is my permutation of Y-adjacent observations&lt;/P&gt;&lt;P&gt;SDrand is Steve's random allocation&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here is the complete test, followed by the results :&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data randomY;&lt;BR /&gt;call streaminit(283219);&lt;BR /&gt;do rep = 1 to 100;&lt;BR /&gt;&amp;nbsp; do i = 1 to 2000; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; y = rand("NORMAL"); &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; output; &lt;BR /&gt;&amp;nbsp; end;&lt;BR /&gt;end;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sort data=randomY; by rep y; run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data PGgroups(drop = _seed _g1-_g8);&lt;BR /&gt;array _g{8} _g1-_g8;&lt;BR /&gt;retain _g (1:8);&lt;BR /&gt;set randomY nobs=_nobs;&lt;BR /&gt;if _n_ &amp;lt;= 8*floor(_nobs/8);&lt;BR /&gt;_seed = 8243959;&lt;BR /&gt;if mod(_n_-1, &lt;span class="lia-unicode-emoji" title=":smiling_face_with_sunglasses:"&gt;😎&lt;/span&gt; = 0 then call ranperm(_seed, of _g1-_g8);&lt;BR /&gt;subGroup = _g(1 + mod(_n_-1, 8));&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data SDgroups;&lt;BR /&gt;set randomY;&lt;BR /&gt;if _n_ = 1 then call streaminit(1);&lt;BR /&gt;subGroup = ceil(8*rand("UNIFORM"));&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;title "Two balanced grouping methods";&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;create table PGmeans as&lt;/P&gt;&lt;P&gt;select rep, subGroup, count(*) as n, mean(y) as meanY, var(y) as varY&lt;/P&gt;&lt;P&gt;from PGgroups&lt;/P&gt;&lt;P&gt;group by rep, subGroup;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;create table PGvars as&lt;/P&gt;&lt;P&gt;select rep, range(n) as rangeN, var(meanY) as varMeanY, range(varY) as rangeVarY&lt;/P&gt;&lt;P&gt;from PGmeans&lt;/P&gt;&lt;P&gt;group by rep;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;create table SDmeans as&lt;/P&gt;&lt;P&gt;select rep, subGroup, count(*) as n, mean(y) as meanY, var(y) as varY&lt;/P&gt;&lt;P&gt;from SDgroups&lt;/P&gt;&lt;P&gt;group by rep, subGroup;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;create table SDvars as&lt;/P&gt;&lt;P&gt;select rep, range(n) as rangeN, var(meanY) as varMeanY, range(varY) as rangeVarY&lt;/P&gt;&lt;P&gt;from SDmeans&lt;/P&gt;&lt;P&gt;group by rep;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;select "PGperm" as method, mean(rangeN) as meanRangeN, mean(varMeanY) as meanVarMeanY, mean(rangeVarY) as meanRangeVarY&lt;/P&gt;&lt;P&gt;from PGvars&lt;/P&gt;&lt;P&gt;union&lt;/P&gt;&lt;P&gt;select "SDrand" as method, mean(rangeN) as meanRangeN, mean(varMeanY) as meanVarMeanY, mean(rangeVarY) as meanRangeVarY&lt;/P&gt;&lt;P&gt;from SDvars;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;-------&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Two balanced grouping methods&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; meanRange&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; method&amp;nbsp; meanRangeN&amp;nbsp; meanVarMeanY&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; VarY&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; --------------------------------------------------------------------------&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PGperm&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.000033&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.020417&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SDrand&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 46.07&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.00396&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.263877&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I think it is fair to conclude that PGperm formed subgroups that were more closely matched.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 24 Mar 2012 03:50:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41363#M1751</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2012-03-24T03:50:17Z</dc:date>
    </item>
    <item>
      <title>Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41364#M1752</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; PG, Rick, Steve,&lt;/P&gt;&lt;P&gt;Thank you all for your valuable inputs, when I get back to work tomorrow (I'm on Australia time zone!) I'll try the PGperm solution, I think it'll give me what I want.&lt;/P&gt;&lt;P&gt;I'm designing an experiment so I want to have all 8 treatment groups starting with similar average Y (which is a variable representing process stability). Random allocation of groups did not give very similar group averages for Y, maybe my assumption of normally distributed Y was wrong.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Fethon Naoum&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sun, 25 Mar 2012 02:34:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41364#M1752</guid>
      <dc:creator>PTD_SAS</dc:creator>
      <dc:date>2012-03-25T02:34:54Z</dc:date>
    </item>
    <item>
      <title>Re: Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41365#M1753</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The performance of both methods will deteriorate slightly with skewed Y data. If you replace&amp;nbsp; y = rand("NORMAL") with&amp;nbsp; y = exp(rand("NORMAL")) in the test above, to simulate lognormal data instead of normal, the test results become:&lt;/P&gt;&lt;P&gt;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;STRONG style="font-size: 12pt; font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Two balanced grouping methods (lognormal data)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="font-size: 12pt; font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; meanRange&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; method&amp;nbsp; meanRangeN&amp;nbsp; meanVarMeanY&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; VarY&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ----------------------------------------------&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PGperm&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.001695&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5.017333&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG style="font-size: 12pt; font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SDrand&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 46.07&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.019796&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 7.407866&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="font-size: 12pt; font-family: courier new,courier;"&gt; &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="font-size: 12pt; font-family: courier new,courier;"&gt;PG&lt;/STRONG&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sun, 25 Mar 2012 02:55:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41365#M1753</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2012-03-25T02:55:43Z</dc:date>
    </item>
    <item>
      <title>Re: Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41366#M1754</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;WOW!&amp;nbsp; I have been looking for something like this for years.&amp;nbsp; I am trashing old code immediately, and am incorporating this method.&amp;nbsp; Too often I have had to "re-randomize" due to differences in means or variances, and this minimizes the chances of those occurrences.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Message was edited by: Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Mar 2012 11:59:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41366#M1754</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2012-03-26T11:59:38Z</dc:date>
    </item>
    <item>
      <title>Re: Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41367#M1755</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If I understand correctly, PG's method is a random assignment into one of 8 subsets.&lt;/P&gt;&lt;P&gt;He's comparing it to a sequential assignment of the sorted data into subgroups.&lt;/P&gt;&lt;P&gt;Yes, the random assignment should perform better.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But I think PG's method is statistically equivalent to assigning U=uniform(1), sorting on U, and assigning subgroup=mod(_N_,8);&lt;/P&gt;&lt;P&gt;The corresponding DATA step is simpler since it avoids arrays and IF/THEN statements.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Rick&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Mar 2012 13:21:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41367#M1755</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2012-03-26T13:21:03Z</dc:date>
    </item>
    <item>
      <title>Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41368#M1756</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; Rick, what you describe as statiscally equivalent to "PG's method" in your second paragraph is actually SD's method. PG's method differs in that it does a random assignment to groups within quantiles of the distribution. Thus, similar values of Y are assigned uniformly among the groups. I did the tests above to explore the difference between local (PG) and global (SD) shuffling.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Mar 2012 14:48:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41368#M1756</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2012-03-26T14:48:40Z</dc:date>
    </item>
    <item>
      <title>Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41369#M1757</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; Key to note that the 'global' method I used did NOT sort before assigning subgroups, so Rick's comment that the two are equivalent should stand, if you sort on U to begin with.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In any case, I learned a lot in this thread.&amp;nbsp; I hope the OP did as well.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Mar 2012 17:13:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41369#M1757</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2012-03-26T17:13:27Z</dc:date>
    </item>
    <item>
      <title>Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41370#M1758</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; I'm very glad that I posted my initial question, I learned a lot! &lt;/P&gt;&lt;P&gt;I used to do just randomisation to assign test groups (for DOE) but that not always resulted in groups with similar averages or variance for specific variables. PG's method works very well, I tested it on various datasets with real-time process data.&lt;/P&gt;&lt;P&gt;Thanks to all of you!&lt;/P&gt;&lt;P&gt;Fethon&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 26 Mar 2012 21:37:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41370#M1758</guid>
      <dc:creator>PTD_SAS</dc:creator>
      <dc:date>2012-03-26T21:37:49Z</dc:date>
    </item>
    <item>
      <title>Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41371#M1759</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt; And after a bit of rest and time, I finally realized what PGStat's method was--a block randomization.&amp;nbsp; Assign observations to a block based on some value (ranking phase), and randomize (permutation phase) within the block.&amp;nbsp; This will almost always lead to more homogeneity within block, and hence the entire schema, when examined over all blocks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 27 Mar 2012 10:57:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41371#M1759</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2012-03-27T10:57:16Z</dc:date>
    </item>
    <item>
      <title>Grouping data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41372#M1760</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I posted a &lt;A __default_attr="1131" __jive_macro_name="document" _modifiedtitle="SAS program " class="jive_macro jive_macro_document" href="https://communities.sas.com/" title="SAS program "&gt;&lt;/A&gt; that explains the test above and also includes yet another assignment method that gives &lt;SPAN style="text-decoration: underline;"&gt;near perfect &lt;/SPAN&gt;balance, plus a couple of references. Following Steve's comment on a proper name for the method that I proposed, I could search further on the net and find that the topic is definitely not a recent one and gets a lot more complicated when one tries to balance many factors at the same time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 27 Mar 2012 18:31:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Grouping-data/m-p/41372#M1760</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2012-03-27T18:31:09Z</dc:date>
    </item>
  </channel>
</rss>

