turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Generating a large sample from a small sample, sum...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

3 weeks ago - last edited 3 weeks ago

Hi All, this is an update on an earlier thread I started here

I'm working on the following problem:

A real estate management company has 9500 units (apartments) that they are responsible for. Because the company only has 1 inspector, and the apartments are spread out, they only have the resources to physically check 35 apartments a year (the inspector is pretty slow apparently). They are hoping to use this sample to estimate how much $$ they should budget for maintenance over all 9500 apartments with a 95% confidence interval. The sample of 35 units can be considered a simple random sample

The dataset looks like this:

```
Data repaircost;
Input Unit Repair_cost;
datalines;
1 10277.00
2 33615.00
3 23442.00
4 11220.00
5 41321.00
6 40801.00
7 20896.00
8 44753.00
9 28659.00
10 19753.00
11 28760.00
12 24537.00
13 20536.00
14 20959.00
15 5693.00
16 8290.00
17 28715.00
18 41550.00
19 18459.00
20 49197.00
21 28955.00
22 46149.00
23 25273.00
24 45867.00
25 24716.00
26 43519.00
27 27884.00
28 37714.00
29 8001.00
30 42151.00
31 43197.00
32 27245.00
33 31736.00
34 9503.00
35 14946.00
;
run;
```

There have been a number of different solutions presented from calculating the mean cost and upper/lower CI for one apartment and multiplying all those numbers by 9500, which will give me a ball-park but is a little too "back of the napkin" I think. (feel free to correct me on that, I would love it if that was the solution)

The most recent suggestion I got was to bootstrap the sample of 35 with replacement to create a new sample of 9500 and sum the costs. I would do this say 10,000 times then order the sums in ascending order and the 250th and 9750th values would represent a 95 percentile confidence interval.

Any help on how to expand my sample population from 35 to 9500, get a total sum of costs and then do it another 9,999 times would be much appreciated!

Thanks so much, as always I'm amazed at how supportive this community is.

*other potential solutions are also more than welcome*

Mike

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to righcoastmike

3 weeks ago

Hi @righcoastmike,

Regarding bootstrap methods with SAS I found this paper (by David Cassell) interesting.

I had followed the other thread a bit and was actually quite confident about the quality of the suggested CI (having the classic reference "Sampling Techniques" by W. G. Cochran at hand). I think, the major risk would be to have some extreme outliers in the population which may or may not occur in the sample. So, another approach would be to *simulate* such populations.

If I had to examine the project, I would also scrutinize the assumption that the sample "can be considered a simple random sample" or if, for example, easily accessible apartments had a higher probability of being included in the sample.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to FreelanceReinhard

3 weeks ago

Thanks Freelance. I'll take a look at that paper, i'm sure it will be helpful.

After some investigation and a helpful stats person I managed to get the code to do the bootstrapping in R (I know not SAS but beggars can't be choosers) and it boosted my confidence in the numbers.

Here's how they compare:

The "calculate the mean for 1 unit and multiply by total number of units" method gave me:

Mean total repair cost 260 392 580.66

lower CL 221 214 943

Upper CL 299 570 217

While the bootstrapping method came up with this:

Mean total repair cost = 260 400 000

lower CL 258 033 216

Upper CL 262 853 966

So much tighter CI for the bootstrapping method, but in this case conservative is OK.

Thanks again for thinking through this with me everyone. It's been really interesting.

Mike

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to righcoastmike

3 weeks ago

Do you have any data on which apartments were renovated when and the timings? There could be some survival type event to determine if an event will happen/when and then a second stage to determine how much would be impacted. Or a logistic regression to predict the probability of an event.

Two Stage Regression is what this is referred to.

The other option would be as you suggested, which is a simulation basically. If you want to simulate data, I would strongly suggest reading ‘Don’t be Loopy’ paper by David Cassell.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

3 weeks ago

Age of buildings is probably a big factor as well.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

3 weeks ago

Hi Reeza,

I'm not sure how the exact numbers were calculated, I've just got the totals. I agree though, there are definitely a bunch of different variables that need to be taken into account. I think the bootstrapping method is good enough for now. Thanks so much for all your help!

Mike

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to righcoastmike

3 weeks ago

I think the bootstrap only tells you that the data is normally distributed, I don’t think it gets rid of any of the initial concerns with the methodology. I would wait for PGStats or Rick to comment though.