<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Subsample a data set with given mean and standard deviation in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Subsample-a-data-set-with-given-mean-and-standard-deviation/m-p/110947#M5865</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If I had to do this I would be tempted to use proc survey select with age as the stratum variable. Put different sizes for each strata (age) such that the result would have your desired mean and deviation with a total size that you want.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I see a certain amount of trial and error as even for a given total number of records there many ways to meet your requirement.&lt;/P&gt;&lt;P&gt;You might simplify code by restricting ages first to something like 45 to 65 to reduce the number of strata to use.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hint if you use this method you probably should set a seed value so you can duplicate the data later if needed.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would start with a trial data set of age and a weight variable and run them through&amp;nbsp; proc means to get an idea of the numbers I want as the strata counts for the data shape and range I want to use later.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 17 Apr 2013 17:35:27 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2013-04-17T17:35:27Z</dc:date>
    <item>
      <title>Subsample a data set with given mean and standard deviation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Subsample-a-data-set-with-given-mean-and-standard-deviation/m-p/110946#M5864</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a very large dataset of clients with their ages and other attributes. Would it be possible to extract a set of records from this larger dataset such that the subset had a mean of say 55 years of age and a standard deviation of 2?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Bruce&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Apr 2013 16:15:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Subsample-a-data-set-with-given-mean-and-standard-deviation/m-p/110946#M5864</guid>
      <dc:creator>BigD</dc:creator>
      <dc:date>2013-04-17T16:15:22Z</dc:date>
    </item>
    <item>
      <title>Re: Subsample a data set with given mean and standard deviation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Subsample-a-data-set-with-given-mean-and-standard-deviation/m-p/110947#M5865</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If I had to do this I would be tempted to use proc survey select with age as the stratum variable. Put different sizes for each strata (age) such that the result would have your desired mean and deviation with a total size that you want.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I see a certain amount of trial and error as even for a given total number of records there many ways to meet your requirement.&lt;/P&gt;&lt;P&gt;You might simplify code by restricting ages first to something like 45 to 65 to reduce the number of strata to use.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hint if you use this method you probably should set a seed value so you can duplicate the data later if needed.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would start with a trial data set of age and a weight variable and run them through&amp;nbsp; proc means to get an idea of the numbers I want as the strata counts for the data shape and range I want to use later.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Apr 2013 17:35:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Subsample-a-data-set-with-given-mean-and-standard-deviation/m-p/110947#M5865</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2013-04-17T17:35:27Z</dc:date>
    </item>
    <item>
      <title>Re: Subsample a data set with given mean and standard deviation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Subsample-a-data-set-with-given-mean-and-standard-deviation/m-p/110948#M5866</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I agree with @Ballardw. You can use the PDF function to determine what percentage to extract for each age stratum. For details and an example, see &lt;A href="http://blogs.sas.com/content/iml/2013/03/11/construct-normal-data-from-summary/" title="http://blogs.sas.com/content/iml/2013/03/11/construct-normal-data-from-summary/"&gt; Construct normal data from summary statistics - The DO Loop&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Apr 2013 18:18:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Subsample-a-data-set-with-given-mean-and-standard-deviation/m-p/110948#M5866</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2013-04-17T18:18:26Z</dc:date>
    </item>
    <item>
      <title>Re: Subsample a data set with given mean and standard deviation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Subsample-a-data-set-with-given-mean-and-standard-deviation/m-p/110949#M5867</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;After lunch and improved blood sugar I remembered we could generate a normal as specified. This will generate a data set to use with Proc SurveySelect as the SampSize option. Pick i in the do loop to match your desired size AND verify that the raw data set has at least as many persons with age for each as the _nsize_ values. This is intended to work with SRS sampling without replacement. Or you can generate sampling fractions.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data dist;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; do i = 1 to 5000;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; x = round((rand('NORMAL',55,2)),1);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; end;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; drop i;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;proc freq data=junk noprint;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; table x /&amp;nbsp; out=SampleSize (rename=(count=_nsize_) drop=percent);&lt;BR /&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Apr 2013 19:41:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Subsample-a-data-set-with-given-mean-and-standard-deviation/m-p/110949#M5867</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2013-04-17T19:41:36Z</dc:date>
    </item>
    <item>
      <title>Re: Subsample a data set with given mean and standard deviation</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Subsample-a-data-set-with-given-mean-and-standard-deviation/m-p/110950#M5868</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks so much for these ideas and help. I will be trying them out!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 17 Apr 2013 20:29:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Subsample-a-data-set-with-given-mean-and-standard-deviation/m-p/110950#M5868</guid>
      <dc:creator>BigD</dc:creator>
      <dc:date>2013-04-17T20:29:05Z</dc:date>
    </item>
  </channel>
</rss>

