Programming the statistical procedures from SAS

Re: The Re: Calculating the CDF of a multivariate T

Reply
New Contributor
Posts: 4

Re: The Re: Calculating the CDF of a multivariate T

Thanks for the quick response!

 

1. Apologies, I probably wasn't explicit enough, yes for 2D problems with continous and discrete the conditional can be used, but when you have a 3D situation (say with responses Continous X1, Discrete X2 and Discrete X3), I think you would need the CDF for the X2 and X3 evaluation. 

 

2. I am not very strong in IML, so thanks for explaining a better way to do it - I will investigate. However, if I am aiming to have a function to help evaluate the likelihood in MCMC, I think I would need something to work out the likelihood for each line of data (or possibly all the data in a matrix using the JOINTMODEL option). So I think that would mean a call to R for each set of parameter values generated in MCMC, which as you say will not be efficient. 

 

Thanks for your useful input!

SAS Super FREQ
Posts: 3,319

Re: The Re: Calculating the CDF of a multivariate T

Are you planning to evaluate the likelihood function in PROC MCMC? If so, I can move this discussion to the Statistical Procedures community where the MCMC experts are more likely to see it. 

New Contributor
Posts: 4

Re: The Re: Calculating the CDF of a multivariate T

If that is possible - that would be great! Thanks!
New Contributor
Posts: 4

Calculating the CDF of a multivariate T

Hi,

 

I would like to be able to calculate the CDF of the Bivariate (and multivariate) T distribution. Has anyone got a reliable and accurate way to do this in SAS? I suspect IML would be needed hence posting here. R has a package mvtnorm that can do it, and I think it is based methods in the paper by Genz & Bretz (attached). Rather than try to digest what looks to be very complicated separation algebra in the paper, I hoped someone may have a solution already?

 

A very simple way to do it would be a MC integration algorithm; generating large numbers of MVT values from IML and averaging them. I have tried this for the bivariate T (see embedded SAS program) but I don’t think this would be accurate and quick enough (I compared the accuracy of the process for the MVN in the code with the SAS function PROBBNRM and it isn’t great).

Any ideas or solutions would be welcome!

 

Thanks,

Tom.

Using SAS 9.3 and 9.4.

 

 

**********************************************;
***    TEST BASIC MONTE CARLO INEGRATION   ***;
**********************************************;

ods html close;
ods listing;

*** MACRO TO GENERATE MONTE CARLO INTEGRATED MVT CDF ESTIMATE IN IML ***;
%macro mc_mvt;
  proc iml;
    call randseed(&seed.);
    mvector = {0 0};
    smatrix = {1 &rho., &rho. 1};
    yvector = randmvt(10000, &df., mvector, smatrix);
    ind = (yvector[,1] <= &x1.) & (yvector[,2] <= &x2.);
    probt = ind[:,];
    call symput ("probt",char(probt));
  quit;
  run;
%mend;

*** MACRO TO GENERATE MONTE CARLO INTEGRATED MVN CDF ESTIMATE IN IML ***;
%macro mc_mvn;
  proc iml;
    call randseed(&seed.);
    mvector = {0 0};
    smatrix = {1 &rho., &rho. 1};
    yvector = randnormal(10000, mvector, smatrix);
    ind = (yvector[,1] <= &x1.) & (yvector[,2] <= &x2.);
    probn = ind[:,];
    call symput ("probn",char(probn));
  quit;
  run;
%mend;

*** FCMP FUNCTION TO RUN MACRO FROM WITHIN DATASTEP ***;
proc fcmp outlib = work.stats.functions;

  function cdf_mvt(x1,x2,rho,df,seed);
    rc = run_macro("mc_mvt",x1,x2,rho,df,seed,probt);
    if rc = 0 then return(probt);
    else return(.);
  endsub;

quit;
run;

*** FCMP FUNCTION TO RUN MACRO FROM WITHIN DATASTEP ***;
proc fcmp outlib = work.stats.functions;

  function cdf_mvn(x1,x2,rho,seed);
    rc = run_macro("mc_mvn",x1,x2,rho,seed,probn);
    if rc = 0 then return(probn);
    else return(.);
  endsub;

quit;
run;

*** CALL FUNCTIONS FROM DATASTEP ***;
options cmplib = (work.stats);
data results;
  do i = -1 to 1 by 0.25;
  do j = -1 to 1 by 0.25;
    r=0.8;
    v=2;
    seeds=12348765;
    mc_probt  = cdf_mvt(i,j,r,v,seeds);
    mc_probn  = cdf_mvn(i,j,r,seeds);
    sas_probn = probbnrm(i,j,r);
    output;
  end;
  end;
run;
Grand Advisor
Posts: 9,466

Re: Calculating the CDF of a multivariate T

You found the right place to post. Here are some paper for help.

http://blogs.sas.com/content/iml/2013/09/25/compute-contours-of-the-bivariate-normal-cdf.html

http://blogs.sas.com/content/iml/2013/09/20/gradient-of-the-bivariate-normal-cumulative-distribution.html

http://blogs.sas.com/content/iml/2012/07/11/visualize-the-bivariate-normal-cumulative-distribution.html


or you could use PROC COPULA to simulate multivariate T distribution and further get the CDF ?


SAS Super FREQ
Posts: 3,319

Re: Calculating the CDF of a multivariate T

There's nothing intrinsically wrong with your simulation code, except that the sample size is too small for this computation. The Monte Carlo error is proportional to 1/sqrt(n), so I'd use at least 100,000 simulated observations. Depending on the application, you might want to simulate 1 million obs, especially if you are interested in behavior in the tails.

 

Change your simulation code to use 100,000 like this:

yvector = randnormal(100000, mvector, smatrix);

and then run this PROC SGPLOT to compare the MC results with the PROBBNRM computations. You'll see that the curves line up.

 

proc sgplot data=results;
series x=j y=mc_probn / group=i curvelabel;
series x=j y=sas_probn / group=i curvelabel;
run;

If you are going to use many points of the CDF,  you might want to pre-compute the CDF values on a fine grid and save the results to a permanent data set.  Then when you want a particular value of x you can look up the nearest grid points and interpolate from the pre-computed values. That approach is only feasible in low dimensions, but you said "bivariate", so it will certainly work for dim=2.  

 

Notice that if you intend to compute many values, you don't need to generate a new random sample for every call. Make a single (huge) random sample and use it to estimate the CDF on a fine grid.

Grand Advisor
Posts: 9,466

Re: Calculating the CDF of a multivariate T

@Rick,

Can you use Markov chain Monte Carlo (MCMC) simulation  to get it ,just as PROC MCMC did ?
That would be very interesting.



New Contributor
Posts: 4

Re: Calculating the CDF of a multivariate T

@Ksharp & @Rick_SAS

 

Thanks for your help and ideas! I will look at increasing the number of simulations and see if there is a way to use this for what I want. The reason for asking for a quick function to evaluate the bivariate T CDF is because I would like to fit joint bayesian t-copula models for mixed continous and discrete data in SAS. To do this finite differencing of the joint CDF is used for the discrete portions of the likelihood. I have successfully done this for the gaussian copula, as I can use PROBBNRM.

 

For completeness I also tried cheating by calling the function from R using IML (code below) however this is not very quick either.

 

*** OPTION TO SET UP WHERE R SOFTWARE IS STORED ON COMPUTER ***;
options set=R_HOME='C:\Program Files\R\R-2.15.3';

*** MACRO TO GENERATE R CALL ***;
%macro r_mvt;
%let workpath = %sysfunc(getoption(work));
filename Rcall "&workpath.\Rcall.sas";
data _null_;
   file Rcall;
   put @1 "*** IML CODE TO SUBMIT TO R ***;" /;
   put @1
"proc iml;"/ 
"  rmatrix = {1 &rho., &rho. 1};"/
"  xvector = {&x1., &x2.};"/
"  df      = &df.;"/
"  call ExportMatrixToR(rmatrix, 'rmat');"/
"  call ExportMatrixToR(xvector, 'xvec');"/
"  call ExportMatrixToR(df, 'dfs');"/
"  submit / R;"/
"    library('mvtnorm');"/
"    ll <- c(-Inf,-Inf);"/
"    ul <- as.numeric(xvec);"/
"    df <- as.numeric(dfs);"/
"    probt <- pmvt(lower=ll, upper=ul, df=df, corr=rmat, algorithm = GenzBretz());"/
"  endsubmit;"/
"  run ImportMatrixFromR(prob_t, 'probt');"/
"  call symput ('probt',char(prob_t));"/
"quit;"/
"run;"
;
run;
%include "&workpath.\Rcall.sas";
%mend;

*** FCMP FUNCTION TO RUN MACRO FROM WITHIN DATASTEP OR MCMC ***;
proc fcmp outlib = work.stats.functions;

  function r_pmvt(x1,x2,rho,df);
    rc = run_macro("r_mvt",x1,x2,rho,df,probt);
    if rc = 0 then return(probt);
    else return(.);
  endsub;

quit;
run;

*** CALL FUNCTIONS FROM DATASTEP ***;
options cmplib = (work.stats);
data results;
  do i = -1 to 1 by 1;
  do j = -1 to 1 by 1;
  do k = -1 to 1 by 0.1;
    v=2;
    r_probt  = r_pmvt(i,j,k,v);
    output;
  end;
  end;
  end;
run;

 

 

 

SAS Super FREQ
Posts: 3,319

The Re: Calculating the CDF of a multivariate T

1. You say "finite differencing of the joint CDF is used for the discrete portions of the likelihood." If I understand this sentence correctly, you might not need the joint CDF because the derivative of the joint CDF is simply the conditional density. For example, dF(x,y)/dy = F(x | y), which reduces it to a 1D problem. If you have a gradient direction, I believe you can still reduce this problem to a 1D integration of the MVT PDF, which you can then transform into a univariate CDF of the t distribution by using a change of variables. Think about it.

 

2. Your "calling R" code is very inefficient. Your code starts and stops PROC IML and R for every scalar value of (i,j,k). There is no need to wrap PROC IML in a macro and call it from the DATA step. You can do this all from IML. First use the EXPANDGRID function to create a grid of (i,j,k) values as an Nx3 matrix.  Transfer that matrix to R. Call whatever R functions you want on the vectors. Transfer back the vectors. It should require starting IML and R only one time. For details and examples of the correct way to call R from SAS/IML, see 

The article "Twelve advantages to calling R from teh SAS/IML language" also addresses reasons why it is better to call R directly from a SAS/IML program than from a macro.

 

Ask a Question
Discussion stats
  • 8 replies
  • 135 views
  • 0 likes
  • 3 in conversation