Help using Base SAS procedures

Overlay chi-squared distribution on histogram (proc univariate)

Reply
Frequent Contributor
Frequent Contributor
Posts: 109

Overlay chi-squared distribution on histogram (proc univariate)

So imagine I have a data that I believe is chi-squared distributed. I want to make a histogram in proc univariate or another program, and then overlay the chi-square distribution.

 

Seems easy enough, though wondering why the chi-square is not an option in Univariate?

SAS Super FREQ
Posts: 289

Re: Overlay chi-squared distribution on histogram (proc univariate)

@annabrown_sas I believe this should be on the SAS Procedures community?

Trusted Advisor
Posts: 1,115

Re: Overlay chi-squared distribution on histogram (proc univariate)

If you mean the (common) chi-squared distribution and not the non-central chi-squared distribution, I think you can make use of the facts that

 

  1. The chi-squared distribution with n degrees of freedom is equal to the gamma distribution with shape parameter n/2 and scale parameter 2. In terms of SAS syntax: pdf('CHISQ', x, df) = pdf('GAMMA', x, df/2, 2).
  2. Unlike the chi-squared distribution, the gamma distribution is among the fitted continuous distributions which PROC UNIVARIATE offers.

For other, even user-defined density curves there is an article How to overlay a custom density curve on a histogram in SAS.

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Overlay chi-squared distribution on histogram (proc univariate)

[ Edited ]

Yes, I saw this in a couple of places - including Wicklin's simulation book, but needed more confirmation to feel totally confortable. So I used the below, since I had 2 degrees of freedom on the chi^2 distribution. Please let me know if this seems incorrect

PROC UNIVARIATE DATA=plotdata;
   VAR x;
   HISTOGRAM x / gamma(alpha=1 sigma=2); 
RUN;

 

 

Trusted Advisor
Posts: 1,115

Re: Overlay chi-squared distribution on histogram (proc univariate)

For a basic plot this should be fine.

 

You can create a large sample of rand('CHISQ',2) values to see the good fit. With this simulated data you can also change the parameter values for alpha and sigma to EST in order to let SAS estimate the parameters. You will see how close these estimates will be to 1 and 2, respectively.

SAS Super FREQ
Posts: 3,475

Re: Overlay chi-squared distribution on histogram (proc univariate)

1. A chi-square distribution with d degrees of freedom is equivalent to a Gamma(d/2, 2) distribution, so, yes, you can use the gamma distribution to overlay a chi-square curve.

2) In general, the way to overlay a known probability density to a sampling distribution (presumably created through Monte Carlo simulation or bootstrapping) is to use the GTL. Since you refer to my book, see p. 40-41, Also see the article "How to overlay a custom density curve on a histogram in SAS."

3) You asked "why the chi-square is not an option in Univariate."  The answer is that UNIVARIATE models data distributions, and real-world data is rarely generated by a process that gives rise to t, F, or chi-square distribution. Those distributions are used to describe the sampling distribution of statistics. That is, they arise from a theoretical investigation of how a statistic varies across many random samples of data.  Consequently, we don't usually fit the parameters in the t, F, and chi-square families. Instead, the parameters (usually called degrees of freedom) are determined by the sample size of the data and are used for inference, such as testing hypotheses, forming confidence intervals, and computing p-values. 

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Overlay chi-squared distribution on histogram (proc univariate)

[ Edited ]

Thank you Freelance and Rick!

 

Yes, my first approach was to simulate these data when I did not see the chi^2 option. I then better educated myself, and realized the chi^2 was a specal case of the gamma distribution. This seemed familiar, but I was a little mistrusting of going forward without confirmation.

 

Funny enough Rick, I currently had the simulation book open to page 39 when writing this reply. I had seen the code you mentioned, but was falsely discourage when seeing how long it was. I know, I want flexibility of writing code, but also point and click options at times. To put this in perspective, I am actually using this as a comparative for generated Mahalanobis squared distances - in the pursuit of examining for outlers. I had seen your Do Loop piece on this and it helped  in my understanding and I was hoping to use it as a complement to the chi^2 Quantile plots. 

 

Question, are the robust Mahalanobis distances more appropriate for data that may be questionably multivariate normal?

 

 

SAS Super FREQ
Posts: 3,475

Re: Overlay chi-squared distribution on histogram (proc univariate)

More appropriate for what? Outlier detection?  Since the MV means and covariance are influenced by outliers, I would say that if your data are MV normal plus contamination, then yes the robust MD would be a better choice for outlier detection.

 

Some possibly relevant references:

1. "Detecting outliers in SAS"

2. pp. 7-9 of Wicklin (2010) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician"

 

Ask a Question
Discussion stats
  • 7 replies
  • 643 views
  • 1 like
  • 4 in conversation