BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Rick_SAS
SAS Super FREQ

I didn't "find" them. I chose those values arbitrarily so that I could show an example of how to integrate the bivariate normal PDF over a rectangular region.

 

In your application, the limits are the "cutoff points" that correspond to the cumulative frequencies for the categories of the ordinal variables, as discussed in the paper. Please re-read the paper for the statistical details. For example, if one of your ordinal variables has three levels with observed counts {10, 25, 15}, then the proportions are {0.2, 0.5, 0.3} and the cumulative proportions are {0.2, 0.7, 1}. You apply the normal quantile function to get the cutoff values {-0.842, 0.524, Infinity} which tell you that the limits of integration in that variable will be 

(-Infinity, -0.842), 

(-0.842, 0.524),

(0.524, Infinity)

 

This brings up an issue that I hadn't thought about: Some of the domains of integration are over infinite regions. You'll have to figure out how to use the PROBBNRM function to integrate over those infinite regions.

 

hrmannan
Calcite | Level 5

Dear Dr. Rick,

thanks. for the example you provided for cutoffs, I have simplified them to (-infinity,.524) and (-infinity,-.524). will this be of any help? I can't write this in terms of probbnrm function as these are the limits of y only. Lets say the cutoffs for x are the same for simplicity. We would have 4 integrals in total.

How can they be expressed in terms of probbnrm? we are doing polychoric correlation.

Haider

hrmannan
Calcite | Level 5

correction: the other integral for y is over (-.842,.524) and the same for x. This gives us 6 integrals-3 for y and 3 for x. How can they be expressed in terms of probbnrm?

Haider

hrmannan
Calcite | Level 5

so the 3 integrals for y are over 

(-Infinity, -0.842), 

(-0.842, 0.524),

(0.524, Infinity)

let's say x has the same distribution as y, then the 3 integrals for x should be the same. How can they be expressed by probbnrm where

probbnrm(a,b,rho)=integration of bivariate normal over (-Infinity, b) - integration of bivariate normal over (-Infinity, a)

rho should be fixed say at 0.5

Please suggest.

Haider

hrmannan
Calcite | Level 5

Dear Dr Rick

in continuation it seems the multiple integration problem has no analytical solution or numerical approximation because of infinite domains for integration.  i have limited background of multivariate calculus now. what i think is that monte carlo integration can be used while quasi monte carlo integration would require fewer assumptions and can be applied. kindly help with this. SAS has features for these.

Haider 

Rick_SAS
SAS Super FREQ

@hrmannan wrote:

it seems the multiple integration problem has no analytical solution or numerical approximation because of infinite domains for integration. 


No, that is not a valid conclusion. The standard bivariate normal distribution can be integrated over all regions. It gives the probability (which is always a positive number less than or equal to 1) of observing a random variate in the region. For example, the integral of the normal distribution in the region

(-Infinity, Infinity) x (-Infinity, Infinity)

is 1. When rho=0, the integral on the domain 

(-Infinity, 0) x (-Infinity, 0)

is 1/4. 

 


@hrmannan wrote:

i have limited background of multivariate calculus.


You might want to find a colleague who can help you with this problem. Implementing the details of a polychoric correlation requires being able to work easily with concepts from multivariable calculus. The MLE program you are asking for requires optimizing a parameter for the sum of several double integrals over infinite domains. During the optimization process, the domains change at each step. 

 

To solve this problem in the UNWEIGHTED case of polychoric correlation, you must be able to do the following:

  1. Use the PROBBNRM function to evaluate the probability for the 9 shapes of regions in a tic-tac-toe board: SW, S, NE, W, Center, E, NW, N, and NE.
  2. Estimate the regions by using the marginal cumulative frequencies. These are the cut points a1, a2, ..., aK, and b1, b2, ..., bM.
  3. Optimize the value of the correlation, rho0, that maximizes the LL for these initial regions.
  4. Allow the boundaries of the regions to vary. Optimize the LL as a function of the correlation, rho, and the cut points. There are now 1 + K + M parameters. The MLE estimate gives you rho*, which is the estimate of the polychoric correlation.

I suggest you pursue this problem in stages. After you can solve it for unweighted polychoric correlation, the last step is to modify the algorithm for weighted correlation.

 

Good luck.

hrmannan
Calcite | Level 5

so monte carlo integration isn't needed. Do the nine regions you mention are related to numerical approximation using trapezoid or simpson's one-third rule?  it would help if you can suggest an example to look at to solve the problem. 

Haider

hrmannan
Calcite | Level 5

Hi Dr Rick

my statement is invalid when i suggested using trapezoidal or Simpsons one-third rule as the domains include infinity. please give example of how probbnrm function can be used over the 9 regions you mentioned. don't know how 9 regions come up. In steps 1 and 2 for calculating polychoric correlation there are three domains for integrating over y and another three domains for integrating over x. This gives us 6 regions of integration in total.  please let me know.

Haider

hrmannan
Calcite | Level 5

i am stuck with 1. you suggested nine regions which I can't understand. an example showing integration over several regions would help me to understand. Both x and y each have 3 domains for integration according to the cutoffs you indicated.

Haider

Rick_SAS
SAS Super FREQ

Sorry for the confusion. What I meant to say is that there are 9 shapes in problems like this. If X and Y both have 2 cut points, then they each define 3 intervals. The 9 shapes for this case are:
- the SW, SE, NW, and NE quadrants

- the S, W, E, and N half strips

- the center rectangle

 

In general, you might have other horizontal or vertical half strips and additional rectangles. But if you can compute the probability over the 9 basic shapes, you can handle the regions that come up in any polychoric problem.

 

I found an online copy of Olsson's 1979 paper on MLE estimation of polychoric correlation. I strongly encourage you to read and understand that paper, which provides all the computations you need for the unweighted polychoric correlation.

 

Good luck.

hrmannan
Calcite | Level 5

dear Dr Rick

thanks for sending the Olssen paper through. I have read the paper and can easily program in SAS both the ML methods for estimating polychoric correlation. I prefer the two-step method because of its computational ease. However, the paper doesn't discuss MLE for polychoric correlation in complex surveys (weighted polychoric correlation).  in this context i have been striving to develop SAS codes when you suggested the four steps in your last post. while i can contemplate on the nine regions for finding area under a curve, i am unable to proceed without an example. Please send me an article with examples and ways to code them in SAS. Looking forward to your guidance.

Haider

hrmannan
Calcite | Level 5

To explain my understanding you (Dr Rick) suggested to split an interval over several sub-intervals particularly nine subintervals which will give rise to nine regions for integration. The methods to my knowledge which can do this are Trapezoidal and Simpson's rules.  But the latter requires even number of subintervals but we have nine, so we are left with Trapezoidal rule. As this requires finite intervals we can't directly apply this method without variable transformations which will lead to finite intervals. I have tried x=rsin(theta) and y=rcos(theta) which gives a Jacobian of r. But I can't find the domains for theta and r. If we assume 0<theta<2*pi and 0<r<infinity, how can we split each interval into three subintervals as required? suggestion from anyone will allow me to move forward. Or will any other transformation   be needed and what are those?

Haider

hrmannan
Calcite | Level 5
Dr Rick
To proceed i believe both x and y have to be transformed in a way that the new intervals are all finite. I can then use trapezoidal or Simpson’s rules to approximate the integrals. Could you suggest the transformations?
Haider

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 42 replies
  • 3068 views
  • 20 likes
  • 3 in conversation