BookmarkSubscribeRSS Feed
_TomDrury
Fluorite | Level 6

Thanks for the quick response!

 

1. Apologies, I probably wasn't explicit enough, yes for 2D problems with continous and discrete the conditional can be used, but when you have a 3D situation (say with responses Continous X1, Discrete X2 and Discrete X3), I think you would need the CDF for the X2 and X3 evaluation. 

 

2. I am not very strong in IML, so thanks for explaining a better way to do it - I will investigate. However, if I am aiming to have a function to help evaluate the likelihood in MCMC, I think I would need something to work out the likelihood for each line of data (or possibly all the data in a matrix using the JOINTMODEL option). So I think that would mean a call to R for each set of parameter values generated in MCMC, which as you say will not be efficient. 

 

Thanks for your useful input!

9 REPLIES 9
_TomDrury
Fluorite | Level 6

Hi,

 

I would like to be able to calculate the CDF of the Bivariate (and multivariate) T distribution. Has anyone got a reliable and accurate way to do this in SAS? I suspect IML would be needed hence posting here. R has a package mvtnorm that can do it, and I think it is based methods in the paper by Genz & Bretz (attached). Rather than try to digest what looks to be very complicated separation algebra in the paper, I hoped someone may have a solution already?

 

A very simple way to do it would be a MC integration algorithm; generating large numbers of MVT values from IML and averaging them. I have tried this for the bivariate T (see embedded SAS program) but I don’t think this would be accurate and quick enough (I compared the accuracy of the process for the MVN in the code with the SAS function PROBBNRM and it isn’t great).

Any ideas or solutions would be welcome!

 

Thanks,

Tom.

Using SAS 9.3 and 9.4.

 

 

**********************************************;
***    TEST BASIC MONTE CARLO INEGRATION   ***;
**********************************************;

ods html close;
ods listing;

*** MACRO TO GENERATE MONTE CARLO INTEGRATED MVT CDF ESTIMATE IN IML ***;
%macro mc_mvt;
  proc iml;
    call randseed(&seed.);
    mvector = {0 0};
    smatrix = {1 &rho., &rho. 1};
    yvector = randmvt(10000, &df., mvector, smatrix);
    ind = (yvector[,1] <= &x1.) & (yvector[,2] <= &x2.);
    probt = ind[:,];
    call symput ("probt",char(probt));
  quit;
  run;
%mend;

*** MACRO TO GENERATE MONTE CARLO INTEGRATED MVN CDF ESTIMATE IN IML ***;
%macro mc_mvn;
  proc iml;
    call randseed(&seed.);
    mvector = {0 0};
    smatrix = {1 &rho., &rho. 1};
    yvector = randnormal(10000, mvector, smatrix);
    ind = (yvector[,1] <= &x1.) & (yvector[,2] <= &x2.);
    probn = ind[:,];
    call symput ("probn",char(probn));
  quit;
  run;
%mend;

*** FCMP FUNCTION TO RUN MACRO FROM WITHIN DATASTEP ***;
proc fcmp outlib = work.stats.functions;

  function cdf_mvt(x1,x2,rho,df,seed);
    rc = run_macro("mc_mvt",x1,x2,rho,df,seed,probt);
    if rc = 0 then return(probt);
    else return(.);
  endsub;

quit;
run;

*** FCMP FUNCTION TO RUN MACRO FROM WITHIN DATASTEP ***;
proc fcmp outlib = work.stats.functions;

  function cdf_mvn(x1,x2,rho,seed);
    rc = run_macro("mc_mvn",x1,x2,rho,seed,probn);
    if rc = 0 then return(probn);
    else return(.);
  endsub;

quit;
run;

*** CALL FUNCTIONS FROM DATASTEP ***;
options cmplib = (work.stats);
data results;
  do i = -1 to 1 by 0.25;
  do j = -1 to 1 by 0.25;
    r=0.8;
    v=2;
    seeds=12348765;
    mc_probt  = cdf_mvt(i,j,r,v,seeds);
    mc_probn  = cdf_mvn(i,j,r,seeds);
    sas_probn = probbnrm(i,j,r);
    output;
  end;
  end;
run;
Ksharp
Super User
You found the right place to post. Here are some paper for help.

http://blogs.sas.com/content/iml/2013/09/25/compute-contours-of-the-bivariate-normal-cdf.html

http://blogs.sas.com/content/iml/2013/09/20/gradient-of-the-bivariate-normal-cumulative-distribution.html

http://blogs.sas.com/content/iml/2012/07/11/visualize-the-bivariate-normal-cumulative-distribution.html


or you could use PROC COPULA to simulate multivariate T distribution and further get the CDF ?


Rick_SAS
SAS Super FREQ

There's nothing intrinsically wrong with your simulation code, except that the sample size is too small for this computation. The Monte Carlo error is proportional to 1/sqrt(n), so I'd use at least 100,000 simulated observations. Depending on the application, you might want to simulate 1 million obs, especially if you are interested in behavior in the tails.

 

Change your simulation code to use 100,000 like this:

yvector = randnormal(100000, mvector, smatrix);

and then run this PROC SGPLOT to compare the MC results with the PROBBNRM computations. You'll see that the curves line up.

 

proc sgplot data=results;
series x=j y=mc_probn / group=i curvelabel;
series x=j y=sas_probn / group=i curvelabel;
run;

If you are going to use many points of the CDF,  you might want to pre-compute the CDF values on a fine grid and save the results to a permanent data set.  Then when you want a particular value of x you can look up the nearest grid points and interpolate from the pre-computed values. That approach is only feasible in low dimensions, but you said "bivariate", so it will certainly work for dim=2.  

 

Notice that if you intend to compute many values, you don't need to generate a new random sample for every call. Make a single (huge) random sample and use it to estimate the CDF on a fine grid.

Ksharp
Super User
@Rick,

Can you use Markov chain Monte Carlo (MCMC) simulation  to get it ,just as PROC MCMC did ?
That would be very interesting.



_TomDrury
Fluorite | Level 6

@Ksharp & @Rick_SAS

 

Thanks for your help and ideas! I will look at increasing the number of simulations and see if there is a way to use this for what I want. The reason for asking for a quick function to evaluate the bivariate T CDF is because I would like to fit joint bayesian t-copula models for mixed continous and discrete data in SAS. To do this finite differencing of the joint CDF is used for the discrete portions of the likelihood. I have successfully done this for the gaussian copula, as I can use PROBBNRM.

 

For completeness I also tried cheating by calling the function from R using IML (code below) however this is not very quick either.

 

*** OPTION TO SET UP WHERE R SOFTWARE IS STORED ON COMPUTER ***;
options set=R_HOME='C:\Program Files\R\R-2.15.3';

*** MACRO TO GENERATE R CALL ***;
%macro r_mvt;
%let workpath = %sysfunc(getoption(work));
filename Rcall "&workpath.\Rcall.sas";
data _null_;
   file Rcall;
   put @1 "*** IML CODE TO SUBMIT TO R ***;" /;
   put @1
"proc iml;"/ 
"  rmatrix = {1 &rho., &rho. 1};"/
"  xvector = {&x1., &x2.};"/
"  df      = &df.;"/
"  call ExportMatrixToR(rmatrix, 'rmat');"/
"  call ExportMatrixToR(xvector, 'xvec');"/
"  call ExportMatrixToR(df, 'dfs');"/
"  submit / R;"/
"    library('mvtnorm');"/
"    ll <- c(-Inf,-Inf);"/
"    ul <- as.numeric(xvec);"/
"    df <- as.numeric(dfs);"/
"    probt <- pmvt(lower=ll, upper=ul, df=df, corr=rmat, algorithm = GenzBretz());"/
"  endsubmit;"/
"  run ImportMatrixFromR(prob_t, 'probt');"/
"  call symput ('probt',char(prob_t));"/
"quit;"/
"run;"
;
run;
%include "&workpath.\Rcall.sas";
%mend;

*** FCMP FUNCTION TO RUN MACRO FROM WITHIN DATASTEP OR MCMC ***;
proc fcmp outlib = work.stats.functions;

  function r_pmvt(x1,x2,rho,df);
    rc = run_macro("r_mvt",x1,x2,rho,df,probt);
    if rc = 0 then return(probt);
    else return(.);
  endsub;

quit;
run;

*** CALL FUNCTIONS FROM DATASTEP ***;
options cmplib = (work.stats);
data results;
  do i = -1 to 1 by 1;
  do j = -1 to 1 by 1;
  do k = -1 to 1 by 0.1;
    v=2;
    r_probt  = r_pmvt(i,j,k,v);
    output;
  end;
  end;
  end;
run;

 

 

 

Rick_SAS
SAS Super FREQ

1. You say "finite differencing of the joint CDF is used for the discrete portions of the likelihood." If I understand this sentence correctly, you might not need the joint CDF because the derivative of the joint CDF is simply the conditional density. For example, dF(x,y)/dy = F(x | y), which reduces it to a 1D problem. If you have a gradient direction, I believe you can still reduce this problem to a 1D integration of the MVT PDF, which you can then transform into a univariate CDF of the t distribution by using a change of variables. Think about it.

 

2. Your "calling R" code is very inefficient. Your code starts and stops PROC IML and R for every scalar value of (i,j,k). There is no need to wrap PROC IML in a macro and call it from the DATA step. You can do this all from IML. First use the EXPANDGRID function to create a grid of (i,j,k) values as an Nx3 matrix.  Transfer that matrix to R. Call whatever R functions you want on the vectors. Transfer back the vectors. It should require starting IML and R only one time. For details and examples of the correct way to call R from SAS/IML, see 

The article "Twelve advantages to calling R from teh SAS/IML language" also addresses reasons why it is better to call R directly from a SAS/IML program than from a macro.

 

Rick_SAS
SAS Super FREQ

Are you planning to evaluate the likelihood function in PROC MCMC? If so, I can move this discussion to the Statistical Procedures community where the MCMC experts are more likely to see it. 

_TomDrury
Fluorite | Level 6
If that is possible - that would be great! Thanks!
ROBIZ7
Fluorite | Level 6

Prof Frank Bretz wrote several programs for computation of probabilities from multivariate normal and t distributions:

https://www.biostat.uni-hannover.de/89.html#c137

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2720 views
  • 0 likes
  • 4 in conversation