SAS Optimization, and SAS Simulation Studio

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-14-2017 05:13 PM

Hi, I have a group of data as following: Value / Count (Frequency) / Probability among all:

And I make a histogram on Probability as attachemnt. While I would like to make the percentage increasing before it reaches mean; and decreasing after it reaches mean. How could I change the data to satisfy that? (Like the trend of Poisson or Normal Distribution)

And can it be a procedure that automatically do that?

Thank you!

Value Count Probability

1 | 16 | 16.66666667 |

2 | 21 | 21.875 |

3 | 19 | 19.79166667 |

4 | 14 | 14.58333333 |

5 | 8 | 8.333333333 |

6 | 9 | 9.375 |

7 | 1 | 1.041666667 |

8 | 4 | 4.166666667 |

9 | 1 | 1.041666667 |

10 | 3 | 3.125 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-14-2017 05:31 PM

I think you need to show an example of what you are looking for as output. I am not quite sure what you mean by percentage increasing before it reaches mean. How is the posted data going to be related to the output?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2017 10:52 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-14-2017 10:04 PM

So you want bell shape (normal) distribution?

better post it at IML forum.

Check PROC MCMC to us Box-Cox transformation to make it as normal distribution.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2017 10:53 AM

Thank you!

I will check on it, And post at IML board if I still have no idea.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2017 11:40 PM

Maybe you want this. http://blogs.sas.com/content/iml/2016/11/02/reverse-data-before-fit-distribution.html

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

3 weeks ago - last edited 2 weeks ago

Hi Ksharp,

Sorry for my late response.

The initial probability table looks like this:

For example, the above figure. I want the Probability of 'No-show' = 6 be smaller than 'No-show' = 5; and 'No-show' = 7 be in the middle of 'No-show' = 6 & 9. Which looks like a Normal or Poisson Distribution, and finally returns probability for each value of 'No-show'.

My current approach is to obtain mean for this group, and generate Poisson Distribution according to it.

Like the following (Not exact the same data, but similar case)

In this way, the probability keeps increasing before the mean value, and then keeps decreasing after that.

Is there any other approach? Thank you!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2017 10:20 AM

If you have the original data, you can use a bootstrap (or smoothed bootstrap) technique to simulate the data.

If you only have the quantiles, you can simulate the (approximate) distribution from a piecewise linear approximation to the empirical CDF. The technique uses the inverse CDF method to simulate from the approximate empirical CDF.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

3 weeks ago

Hi Rick,

I read through your blogs, those are extermely helpful. Thank you!

For this particular question,

My initial probability distributon is like this :

This is the actrual case, however, I want to adust the probability ofr some values: For example, in the above figure. I want the Probability of 'No-show' = 6 be smaller than 'No-show' = 5; and 'No-show' = 7 be in the middle of 'No-show' = 6 & 9. Which looks like a Normal or Poisson Distribution, and finally returns probability for each value of 'No-show'.

My current approach is to obtain mean for this group, and generate Poisson Distribution according to it.

Like the following (Not exact the same data, but similar case) :

So that in this way, the probability is increasing before the mean and then decreasing after it.

Is there any other approach? Thank you!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago

It sounds like you are trying to transform the distribution towards normality. Usually, you would use a LOG transformation or a square-root transformation to transform a continuous distribution into another continuous distribution, but you seem to want to transform a discrete distribution. It is not clear to me if the new distribution is supposed to be discrete or continuous. If it can be continuous, try the LOG transformation on your data.

I think it would be helpful if you describe what you are trying to accomplish scientifically. What are the data? What scientific question are you trying to answer or model? Why are you trying to transform the data towards normality?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago

Hi Rick,

Thanks for your quick reply!

The data is historical 'No-show' information. ('No-show' mean already booked but not appear in hospitality industry) so that it must be integer (not continuous).

And X-axis is the count for 'No-show' that appears in the past: Y-axis is its regarding probability. ('No-show' = 1 takes about 17% and 'No-show' = 2 takes about 22%).

In real case, probability for 'No-show' = 6 might be higher than 'No-show' = 5 and also higher than 'No-show' = 7. However, I would like to transfer the data so that the probability for 'No-show' = 6 is between 'No-show' = 5 and 'No-show' = 7. Normality will work I think, but data should be discrete instead of continuous (that's why I tried Poisson myself). In doing so, it would be easier to interpret the distribution to the business.

Thnak you!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago

> In real case, probability for 'No-show' = 6 might be higher than 'No-show' = 5 and also higher than 'No-show' = 7.

Not sure what "in real case" means. You say this is historical data, so the data is real, right?

Are either of the following what you are trying to do?

1. Adjust/modify the observed proportions so that they better fit some theoretical model or pre-held beliefs.

2. Find some discrete parametric distribution that you can fit to the data. You want the fitted distribution to look somewhat normal.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago

Hi Rick,

Thanks! And in real case means historical data, and it is real as you said.

And the second thing you mentioned is that I want, which is to fit the distribution to look somewhat normal.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago - last edited 2 weeks ago

You can't change the distribution. The data have the shape that they have. What you can do is fit a discrete distribution to the data. I still don't understand how the data are generated, so I can't recommend whether you should use a Poisson, binomial, or something else. However, there is a SAS Knowledge Base article that shows how to fit discrete distributions:

http://support.sas.com/kb/48/914.html

http://support.sas.com/kb/24/166.html

Here's a fit to the binomial distribution:

```
data A;
input freq @@;
N = _N_;
NTrials = 96; /* you need the sample size to model the Binomial distrib */
datalines;
16 21 19 14 8 9 1 4 1 3
;
run;
proc means sum; run;
proc genmod data=A;
freq freq;
model N/NTrials = / dist=binomial;
output out=predbin p=p;
run;
proc print data=predbin(obs=1);
var p; /* print the parameter estimate */
run;
data fit;
p = 0.037218;
set A;
pct = 100 * pdf("Binomial", N, p, NTrials); /* expected values */
run;
proc sgplot data=fit;
vbar N / response=freq; /* raw data */
vline N / response=pct markers; /* expected values under binomial model */
yaxis grid;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

2 weeks ago

Hi Rick,

That's great. I will take a look at it and try.