<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Generating a large sample from a small sample, sum variable and repeat.....10,000 times in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474232#M121807</link>
    <description>Age of buildings is probably a big factor as well.</description>
    <pubDate>Thu, 28 Jun 2018 19:54:01 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2018-06-28T19:54:01Z</dc:date>
    <item>
      <title>Generating a large sample from a small sample, sum variable and repeat.....10,000 times</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474201#M121799</link>
      <description>&lt;P&gt;Hi All, this is an update on an earlier thread I started &lt;A href="https://communities.sas.com/t5/Base-SAS-Programming/Inferring-data-about-a-large-population-from-a-small-sample/m-p/474032#M121737" target="_self"&gt;here&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I'm working on the following problem:&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;A real estate management company has 9500&amp;nbsp;units (apartments) that they are responsible for. Because the company only has 1 inspector, and the apartments are spread out, they only have the resources to physically check 35 apartments a year (the inspector is pretty slow apparently). They are hoping to use this sample to estimate how much $$ they should budget for maintenance over all 9500 apartments with a 95% confidence interval. The sample of 35 units can be considered a simple random sample&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The dataset looks like this:&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data repaircost;
Input Unit Repair_cost;
datalines;
1 10277.00
2 33615.00
3 23442.00
4 11220.00
5 41321.00
6 40801.00
7 20896.00
8 44753.00
9 28659.00
10 19753.00
11 28760.00
12 24537.00
13 20536.00
14 20959.00
15 5693.00
16 8290.00
17 28715.00
18 41550.00
19 18459.00
20 49197.00
21 28955.00
22 46149.00
23 25273.00
24 45867.00
25 24716.00
26 43519.00
27 27884.00
28 37714.00
29 8001.00
30 42151.00
31 43197.00
32 27245.00
33 31736.00
34 9503.00
35 14946.00
;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;There have been a number of different solutions presented from calculating the mean cost and upper/lower CI for one apartment and multiplying all those numbers by 9500, which will give me a ball-park but is a little too "back of the napkin" I think. (feel free to correct me on that, I would love it if that was the solution)&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The most recent suggestion I got was to&amp;nbsp;&amp;nbsp;bootstrap the sample of 35 with replacement to create a new sample of 9500 and sum the costs. I would do this say 10,000 times then order the sums in ascending order and the 250th and 9750th values would represent a 95 percentile confidence interval.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Any help on how to expand my sample population from 35 to 9500, get a total sum of costs and then do it another 9,999 times would be much appreciated!&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks so much, as always I'm amazed at how supportive this community is.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;*other potential solutions are also more than welcome*&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Mike&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jun 2018 18:24:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474201#M121799</guid>
      <dc:creator>righcoastmike</dc:creator>
      <dc:date>2018-06-28T18:24:33Z</dc:date>
    </item>
    <item>
      <title>Re: Generating a large sample from a small sample, sum variable and repeat.....10,000 times</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474227#M121804</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/154458"&gt;@righcoastmike&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding bootstrap methods with SAS I found &lt;A href="http://support.sas.com/resources/papers/proceedings10/268-2010.pdf" target="_blank"&gt;this paper&lt;/A&gt; (by David Cassell) interesting.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I had followed the other thread a bit and was actually quite confident about the quality of the suggested CI (having the classic reference "Sampling Techniques" by W. G. Cochran at hand). I think, the major risk would be to have some extreme outliers in the population which may or may not occur in the sample. So, another approach would be to &lt;EM&gt;simulate&lt;/EM&gt; such populations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If I had to examine the project, I would also scrutinize the assumption that the sample "can be considered a simple random sample" or if, for example,&amp;nbsp;easily accessible apartments&amp;nbsp;had a higher probability of being included&amp;nbsp;in the sample.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jun 2018 19:47:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474227#M121804</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2018-06-28T19:47:33Z</dc:date>
    </item>
    <item>
      <title>Re: Generating a large sample from a small sample, sum variable and repeat.....10,000 times</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474230#M121805</link>
      <description>&lt;P&gt;Do you have any data on which apartments were renovated when and the timings? There could be some survival type event to determine if an event will happen/when and then a second stage to determine how much would be impacted. Or a logistic regression to predict the probability of an event.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Two Stage Regression is what this is referred to.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The other option would be as you suggested, which is a simulation basically. If you want to simulate data, I would strongly suggest reading ‘Don’t be Loopy’ paper by David Cassell.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jun 2018 19:52:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474230#M121805</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-06-28T19:52:25Z</dc:date>
    </item>
    <item>
      <title>Re: Generating a large sample from a small sample, sum variable and repeat.....10,000 times</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474231#M121806</link>
      <description>&lt;P&gt;Thanks Freelance.&amp;nbsp; I'll take a look at that paper, i'm sure it will be helpful.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;After some investigation and a helpful stats person I managed to get the code to do the bootstrapping in R (I know not SAS but beggars can't be choosers) and it boosted my confidence in the numbers.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's how they compare:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The "calculate the mean for 1 unit and multiply by total number of units" method gave me:&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Mean total repair cost 260 392 580.66&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;lower CL 221 214 943&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Upper CL 299 570 217&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;While the bootstrapping method came up with this:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Mean total repair cost = 260 400 000&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;lower CL 258 033 216&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Upper CL 262 853 966&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;So much tighter CI for the bootstrapping method, but in this case conservative is OK.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks again for thinking through this with me everyone. It's been really interesting.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Mike&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jun 2018 19:53:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474231#M121806</guid>
      <dc:creator>righcoastmike</dc:creator>
      <dc:date>2018-06-28T19:53:55Z</dc:date>
    </item>
    <item>
      <title>Re: Generating a large sample from a small sample, sum variable and repeat.....10,000 times</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474232#M121807</link>
      <description>Age of buildings is probably a big factor as well.</description>
      <pubDate>Thu, 28 Jun 2018 19:54:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474232#M121807</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-06-28T19:54:01Z</dc:date>
    </item>
    <item>
      <title>Re: Generating a large sample from a small sample, sum variable and repeat.....10,000 times</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474234#M121808</link>
      <description>&lt;P&gt;Hi Reeza,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm not sure how the exact numbers were calculated, I've just got the totals. I agree though, there are definitely a bunch of different variables that need to be taken into account. I think the bootstrapping method is good enough for now. Thanks so much for all your help!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Mike&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Jun 2018 19:56:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474234#M121808</guid>
      <dc:creator>righcoastmike</dc:creator>
      <dc:date>2018-06-28T19:56:24Z</dc:date>
    </item>
    <item>
      <title>Re: Generating a large sample from a small sample, sum variable and repeat.....10,000 times</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474239#M121809</link>
      <description>I think the bootstrap only tells you that the data is normally distributed, I don’t think it gets rid of any of the initial concerns with the methodology. I would wait for PGStats or Rick to comment though.</description>
      <pubDate>Thu, 28 Jun 2018 20:11:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Generating-a-large-sample-from-a-small-sample-sum-variable-and/m-p/474239#M121809</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-06-28T20:11:38Z</dc:date>
    </item>
  </channel>
</rss>

