<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to random sample with desired aggregate statistics in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/344043#M79033</link>
    <description>&lt;P&gt;PGStats presents an interesting algorithm, and I give him credit for originality. But let's be clear: the sampling scheme is a heuristic method that will often give good results, but is not guaranteed. At the very last step the algorithm&amp;nbsp;might select an extreme outlier that renders the average value out of range. &amp;nbsp;Still, it probably works for many real-life measurements, especially if the values are distributed symmetrically about the mean.&lt;/P&gt;</description>
    <pubDate>Fri, 24 Mar 2017 12:26:51 GMT</pubDate>
    <dc:creator>Rick_SAS</dc:creator>
    <dc:date>2017-03-24T12:26:51Z</dc:date>
    <item>
      <title>How to random sample with desired aggregate statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/343926#M78994</link>
      <description>&lt;P&gt;Hi, there.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have large data&amp;nbsp;containing 20,000 people's total test score.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;ex)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Total Score&lt;/P&gt;
&lt;P&gt;member1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 727&lt;/P&gt;
&lt;P&gt;member2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 566&lt;/P&gt;
&lt;P&gt;member3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 661&lt;/P&gt;
&lt;P&gt;member4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 525&lt;/P&gt;
&lt;P&gt;member5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 609&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .......&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I want to extract exactly 2,818 people's information.&lt;/P&gt;
&lt;P&gt;And I also want to 2,818 people's average total score is between 560~565.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any help and tips will be much appreciated !&lt;/P&gt;
&lt;P&gt;Thanks, Jamie.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Mar 2017 02:41:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/343926#M78994</guid>
      <dc:creator>jamie0111</dc:creator>
      <dc:date>2017-03-24T02:41:00Z</dc:date>
    </item>
    <item>
      <title>Re: How to random sample with desired aggregate statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/343937#M78997</link>
      <description>&lt;P&gt;That's not the definition of a random sample &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think one of the quickest ways is to randomly sample, calculate the average, ensure it meets the criteria and then stop or repeat.&lt;/P&gt;
&lt;P&gt;Or limit the range of scores allowed. Is it ok for one person to have a score of 825 and another 300 as long as the average is between 560-565? That's a tight range as well....&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;EDIT: PROC SURVEYSELECT is typically the method to implement a random sample in SAS.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Mar 2017 02:44:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/343937#M78997</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-03-24T02:44:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to random sample with desired aggregate statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/343947#M79000</link>
      <description>&lt;P&gt;You could balance your random sampling like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Generate some fake data */
data examplePop;
do member = 1 to 20000;
    score = round(1000*rand("uniform"));
    output;
    end;
run;

%let mid=562.5;
%let size=2818;

/* Split the data into two subsets, lower and higher than the target value.
 Add random value for next step */
data lower higher;
set examplePop;
rnd = rand("uniform");
if score &amp;lt; &amp;amp;mid then output lower;
else output higher;
run;

/* Put both subsets into a random order */
proc sort data=lower out=lower(drop=rnd); by rnd; run;
proc sort data=higher out=higher(drop=rnd); by rnd; run;

/* Pick values from the subset that is on the opposite side from the target. If
 the current mean is lower than the target, pick a value at random from the 
 the higher subset, and vice-versa. */
data want;
if sum &amp;lt; &amp;amp;mid * n then set higher;
else set lower;
output;
n + 1;
sum + score;
if n = &amp;amp;size then stop;
drop n sum;
run;

/* Check the mean and std dev from the initial population and the selected sample */
proc sql;
select "examplePop" , count(*) as n, mean(score) as mean, std(score) as std 
    from examplePop
union all
select "want", count(*) as n, mean(score) as mean, std(score) as std 
    from want;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;EDIT: Added comments.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Mar 2017 04:20:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/343947#M79000</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2017-03-24T04:20:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to random sample with desired aggregate statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/343955#M79002</link>
      <description>&lt;P&gt;Better post it at IML forum, it is about data simulation.&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt;&amp;nbsp;is there .&lt;/P&gt;</description>
      <pubDate>Fri, 24 Mar 2017 03:48:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/343955#M79002</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-03-24T03:48:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to random sample with desired aggregate statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/343977#M79011</link>
      <description>&lt;P&gt;I think score between 560-565 sampling will be much better.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Then, how can I limit the range of score of sampling...?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Your help will be much appreciated!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks, Jamie.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Mar 2017 07:59:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/343977#M79011</guid>
      <dc:creator>jamie0111</dc:creator>
      <dc:date>2017-03-24T07:59:09Z</dc:date>
    </item>
    <item>
      <title>Re: How to random sample with desired aggregate statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/344009#M79025</link>
      <description>&lt;P&gt;Mine looks like PG's code.&lt;/P&gt;
&lt;P&gt;But neither mine nor PG's code can guarantee you to get what you want.&lt;/P&gt;
&lt;P&gt;It is highly depended on your data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data examplePop;
do member = 1 to 20000;
    score = round(1000*rand("uniform"));
    output;
    end;
run;

proc sort data=examplepop;
 by score;
run;

data want;
 set examplepop;
 array m{0:2817} _temporary_;
 array s{0:2817} _temporary_;
 i=mod(_n_,2818);
 m{i}=member;
 s{i}=score;
 if 560&amp;lt;=mean(of s{*})&amp;lt;=565 and _n_ gt 2817 then do;
  do j=0 to 2817;
   member=m{j};
   score=s{j};
   output;
  end;
  stop;
 end;
 drop i j ;
run;



/* Check the mean and std dev from the initial population and the selected sample */
proc sql;
select "examplePop" , count(*) as n, mean(score) as mean, std(score) as std 
    from examplePop
union all
select "want", count(*) as n, mean(score) as mean, std(score) as std 
    from want;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Mar 2017 10:33:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/344009#M79025</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-03-24T10:33:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to random sample with desired aggregate statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/344043#M79033</link>
      <description>&lt;P&gt;PGStats presents an interesting algorithm, and I give him credit for originality. But let's be clear: the sampling scheme is a heuristic method that will often give good results, but is not guaranteed. At the very last step the algorithm&amp;nbsp;might select an extreme outlier that renders the average value out of range. &amp;nbsp;Still, it probably works for many real-life measurements, especially if the values are distributed symmetrically about the mean.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Mar 2017 12:26:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/344043#M79033</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2017-03-24T12:26:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to random sample with desired aggregate statistics</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/483153#M125269</link>
      <description>&lt;P&gt;I was also facing similar kind of issue(even though not exactly same) . Do you have anything in mind as such if is is okay for me&amp;nbsp;to have&amp;nbsp;&lt;SPAN&gt;one person with a score of 825 and another 300 as long as the average is between 560-565. Since same thing only will happen with me in most of the cases.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Aug 2018 18:42:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-random-sample-with-desired-aggregate-statistics/m-p/483153#M125269</guid>
      <dc:creator>abc_xyz</dc:creator>
      <dc:date>2018-08-01T18:42:09Z</dc:date>
    </item>
  </channel>
</rss>

