BookmarkSubscribeRSS Feed
H
Pyrite | Level 9 H
Pyrite | Level 9

So imagine I have a data that I believe is chi-squared distributed. I want to make a histogram in proc univariate or another program, and then overlay the chi-square distribution.

 

Seems easy enough, though wondering why the chi-square is not an option in Univariate?

7 REPLIES 7
Sam_SAS
SAS Employee

@AnnaBrown I believe this should be on the SAS Procedures community?

FreelanceReinh
Jade | Level 19

If you mean the (common) chi-squared distribution and not the non-central chi-squared distribution, I think you can make use of the facts that

 

  1. The chi-squared distribution with n degrees of freedom is equal to the gamma distribution with shape parameter n/2 and scale parameter 2. In terms of SAS syntax: pdf('CHISQ', x, df) = pdf('GAMMA', x, df/2, 2).
  2. Unlike the chi-squared distribution, the gamma distribution is among the fitted continuous distributions which PROC UNIVARIATE offers.

For other, even user-defined density curves there is an article How to overlay a custom density curve on a histogram in SAS.

H
Pyrite | Level 9 H
Pyrite | Level 9

Yes, I saw this in a couple of places - including Wicklin's simulation book, but needed more confirmation to feel totally confortable. So I used the below, since I had 2 degrees of freedom on the chi^2 distribution. Please let me know if this seems incorrect

PROC UNIVARIATE DATA=plotdata;
   VAR x;
   HISTOGRAM x / gamma(alpha=1 sigma=2); 
RUN;

 

 

FreelanceReinh
Jade | Level 19

For a basic plot this should be fine.

 

You can create a large sample of rand('CHISQ',2) values to see the good fit. With this simulated data you can also change the parameter values for alpha and sigma to EST in order to let SAS estimate the parameters. You will see how close these estimates will be to 1 and 2, respectively.

Rick_SAS
SAS Super FREQ

1. A chi-square distribution with d degrees of freedom is equivalent to a Gamma(d/2, 2) distribution, so, yes, you can use the gamma distribution to overlay a chi-square curve.

2) In general, the way to overlay a known probability density to a sampling distribution (presumably created through Monte Carlo simulation or bootstrapping) is to use the GTL. Since you refer to my book, see p. 40-41, Also see the article "How to overlay a custom density curve on a histogram in SAS."

3) You asked "why the chi-square is not an option in Univariate."  The answer is that UNIVARIATE models data distributions, and real-world data is rarely generated by a process that gives rise to t, F, or chi-square distribution. Those distributions are used to describe the sampling distribution of statistics. That is, they arise from a theoretical investigation of how a statistic varies across many random samples of data.  Consequently, we don't usually fit the parameters in the t, F, and chi-square families. Instead, the parameters (usually called degrees of freedom) are determined by the sample size of the data and are used for inference, such as testing hypotheses, forming confidence intervals, and computing p-values. 

H
Pyrite | Level 9 H
Pyrite | Level 9

Thank you Freelance and Rick!

 

Yes, my first approach was to simulate these data when I did not see the chi^2 option. I then better educated myself, and realized the chi^2 was a specal case of the gamma distribution. This seemed familiar, but I was a little mistrusting of going forward without confirmation.

 

Funny enough Rick, I currently had the simulation book open to page 39 when writing this reply. I had seen the code you mentioned, but was falsely discourage when seeing how long it was. I know, I want flexibility of writing code, but also point and click options at times. To put this in perspective, I am actually using this as a comparative for generated Mahalanobis squared distances - in the pursuit of examining for outlers. I had seen your Do Loop piece on this and it helped  in my understanding and I was hoping to use it as a complement to the chi^2 Quantile plots. 

 

Question, are the robust Mahalanobis distances more appropriate for data that may be questionably multivariate normal?

 

 

Rick_SAS
SAS Super FREQ

More appropriate for what? Outlier detection?  Since the MV means and covariance are influenced by outliers, I would say that if your data are MV normal plus contamination, then yes the robust MD would be a better choice for outlier detection.

 

Some possibly relevant references:

1. "Detecting outliers in SAS"

2. pp. 7-9 of Wicklin (2010) "Rediscovering SAS/IML Software: Modern Data Analysis for the Practicing Statistician"

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2761 views
  • 1 like
  • 4 in conversation