Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-14-2017 05:13 PM
(1073 views)

Hi, I have a group of data as following: Value / Count (Frequency) / Probability among all:

And I make a histogram on Probability as attachemnt. While I would like to make the percentage increasing before it reaches mean; and decreasing after it reaches mean. How could I change the data to satisfy that? (Like the trend of Poisson or Normal Distribution)

And can it be a procedure that automatically do that?

Thank you!

Value Count Probability

1 | 16 | 16.66666667 |

2 | 21 | 21.875 |

3 | 19 | 19.79166667 |

4 | 14 | 14.58333333 |

5 | 8 | 8.333333333 |

6 | 9 | 9.375 |

7 | 1 | 1.041666667 |

8 | 4 | 4.166666667 |

9 | 1 | 1.041666667 |

10 | 3 | 3.125 |

14 REPLIES 14

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

Sorry for not explaining well. I would like the output to be like normal or poisson distribution or left skewed is okay. Like the one in the attachment. So that I may need to change the data value. Thanks!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So you want bell shape (normal) distribution?

better post it at IML forum.

Check PROC MCMC to us Box-Cox transformation to make it as normal distribution.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you!

I will check on it, And post at IML board if I still have no idea.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Ksharp,

Sorry for my late response.

The initial probability table looks like this:

For example, the above figure. I want the Probability of 'No-show' = 6 be smaller than 'No-show' = 5; and 'No-show' = 7 be in the middle of 'No-show' = 6 & 9. Which looks like a Normal or Poisson Distribution, and finally returns probability for each value of 'No-show'.

My current approach is to obtain mean for this group, and generate Poisson Distribution according to it.

Like the following (Not exact the same data, but similar case)

In this way, the probability keeps increasing before the mean value, and then keeps decreasing after that.

Is there any other approach? Thank you!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you have the original data, you can use a bootstrap (or smoothed bootstrap) technique to simulate the data.

If you only have the quantiles, you can simulate the (approximate) distribution from a piecewise linear approximation to the empirical CDF. The technique uses the inverse CDF method to simulate from the approximate empirical CDF.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Rick,

I read through your blogs, those are extermely helpful. Thank you!

For this particular question,

My initial probability distributon is like this :

This is the actrual case, however, I want to adust the probability ofr some values: For example, in the above figure. I want the Probability of 'No-show' = 6 be smaller than 'No-show' = 5; and 'No-show' = 7 be in the middle of 'No-show' = 6 & 9. Which looks like a Normal or Poisson Distribution, and finally returns probability for each value of 'No-show'.

My current approach is to obtain mean for this group, and generate Poisson Distribution according to it.

Like the following (Not exact the same data, but similar case) :

So that in this way, the probability is increasing before the mean and then decreasing after it.

Is there any other approach? Thank you!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It sounds like you are trying to transform the distribution towards normality. Usually, you would use a LOG transformation or a square-root transformation to transform a continuous distribution into another continuous distribution, but you seem to want to transform a discrete distribution. It is not clear to me if the new distribution is supposed to be discrete or continuous. If it can be continuous, try the LOG transformation on your data.

I think it would be helpful if you describe what you are trying to accomplish scientifically. What are the data? What scientific question are you trying to answer or model? Why are you trying to transform the data towards normality?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Rick,

Thanks for your quick reply!

The data is historical 'No-show' information. ('No-show' mean already booked but not appear in hospitality industry) so that it must be integer (not continuous).

And X-axis is the count for 'No-show' that appears in the past: Y-axis is its regarding probability. ('No-show' = 1 takes about 17% and 'No-show' = 2 takes about 22%).

In real case, probability for 'No-show' = 6 might be higher than 'No-show' = 5 and also higher than 'No-show' = 7. However, I would like to transfer the data so that the probability for 'No-show' = 6 is between 'No-show' = 5 and 'No-show' = 7. Normality will work I think, but data should be discrete instead of continuous (that's why I tried Poisson myself). In doing so, it would be easier to interpret the distribution to the business.

Thnak you!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

> In real case, probability for 'No-show' = 6 might be higher than 'No-show' = 5 and also higher than 'No-show' = 7.

Not sure what "in real case" means. You say this is historical data, so the data is real, right?

Are either of the following what you are trying to do?

1. Adjust/modify the observed proportions so that they better fit some theoretical model or pre-held beliefs.

2. Find some discrete parametric distribution that you can fit to the data. You want the fitted distribution to look somewhat normal.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Rick,

Thanks! And in real case means historical data, and it is real as you said.

And the second thing you mentioned is that I want, which is to fit the distribution to look somewhat normal.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You can't change the distribution. The data have the shape that they have. What you can do is fit a discrete distribution to the data. I still don't understand how the data are generated, so I can't recommend whether you should use a Poisson, binomial, or something else. However, there is a SAS Knowledge Base article that shows how to fit discrete distributions:

http://support.sas.com/kb/48/914.html

http://support.sas.com/kb/24/166.html

Here's a fit to the binomial distribution:

```
data A;
input freq @@;
N = _N_;
NTrials = 96; /* you need the sample size to model the Binomial distrib */
datalines;
16 21 19 14 8 9 1 4 1 3
;
run;
proc means sum; run;
proc genmod data=A;
freq freq;
model N/NTrials = / dist=binomial;
output out=predbin p=p;
run;
proc print data=predbin(obs=1);
var p; /* print the parameter estimate */
run;
data fit;
p = 0.037218;
set A;
pct = 100 * pdf("Binomial", N, p, NTrials); /* expected values */
run;
proc sgplot data=fit;
vbar N / response=freq; /* raw data */
vline N / response=pct markers; /* expected values under binomial model */
yaxis grid;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Rick,

That's great. I will take a look at it and try.

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.