turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Overlay chi-squared distribution on histogram (pro...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-18-2015 09:12 AM

So imagine I have a data that I believe is chi-squared distributed. I want to make a histogram in proc univariate or another program, and then overlay the chi-square distribution.

Seems easy enough, though wondering why the chi-square is not an option in Univariate?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-18-2015 09:32 AM

@annabrown_sas I believe this should be on the SAS Procedures community?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-18-2015 12:07 PM

If you mean the (common) chi-squared distribution and not the non-central chi-squared distribution, I think you can make use of the facts that

- The chi-squared distribution with n degrees of freedom is equal to the gamma distribution with shape parameter n/2 and scale parameter 2. In terms of SAS syntax: pdf('CHISQ', x, df) = pdf('GAMMA', x, df/2, 2).
- Unlike the chi-squared distribution, the gamma distribution is among the fitted continuous distributions which PROC UNIVARIATE offers.

For other, even user-defined density curves there is an article How to overlay a custom density curve on a histogram in SAS.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-18-2015 02:25 PM - edited 11-18-2015 02:28 PM

Yes, I saw this in a couple of places - including Wicklin's simulation book, but needed more confirmation to feel totally confortable. So I used the below, since I had 2 degrees of freedom on the chi^2 distribution. Please let me know if this seems incorrect

PROC UNIVARIATE DATA=plotdata; VAR x; HISTOGRAM x / gamma(alpha=1 sigma=2); RUN;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-18-2015 03:55 PM

For a basic plot this should be fine.

You can create a large sample of rand('CHISQ',2) values to see the good fit. With this simulated data you can also change the parameter values for alpha and sigma to EST in order to let SAS estimate the parameters. You will see how close these estimates will be to 1 and 2, respectively.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-19-2015 09:07 AM

1. A chi-square distribution with d degrees of freedom is equivalent to a Gamma(d/2, 2) distribution, so, yes, you can use the gamma distribution to overlay a chi-square curve.

2) In general, the way to overlay a known probability density to a sampling distribution (presumably created through Monte Carlo simulation or bootstrapping) is to use the GTL. Since you refer to my book, see p. 40-41, Also see the article "How to overlay a custom density curve on a histogram in SAS."

3) You asked "why the chi-square is not an option in Univariate." The answer is that UNIVARIATE models data distributions, and real-world data is rarely generated by a process that gives rise to t, F, or chi-square distribution. Those distributions are used to describe the sampling distribution of statistics. That is, they arise from a theoretical investigation of how a statistic varies across many random samples of data. Consequently, we don't usually fit the parameters in the t, F, and chi-square families. Instead, the parameters (usually called degrees of freedom) are determined by the sample size of the data and are used for inference, such as testing hypotheses, forming confidence intervals, and computing p-values.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-19-2015 01:49 PM - edited 11-19-2015 01:51 PM

Thank you Freelance and Rick!

Yes, my first approach was to simulate these data when I did not see the chi^2 option. I then better educated myself, and realized the chi^2 was a specal case of the gamma distribution. This seemed familiar, but I was a little mistrusting of going forward without confirmation.

Funny enough Rick, I currently had the simulation book open to page 39 when writing this reply. I had seen the code you mentioned, but was falsely discourage when seeing how long it was. I know, I want flexibility of writing code, but also point and click options at times. To put this in perspective, I am actually using this as a comparative for generated Mahalanobis squared distances - in the pursuit of examining for outlers. I had seen your Do Loop piece on this and it helped in my understanding and I was hoping to use it as a complement to the chi^2 Quantile plots.

Question, are the robust Mahalanobis distances more appropriate for data that may be questionably multivariate normal?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-19-2015 04:11 PM

More appropriate for what? Outlier detection? Since the MV means and covariance are influenced by outliers, I would say that if your data are MV normal plus contamination, then yes the robust MD would be a better choice for outlier detection.

Some possibly relevant references:

1. "Detecting outliers in SAS"

2. pp. 7-9 of Wicklin (2010) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician"