Solved: simulating MV categorical data, two major errors occurred

wang267 · Posted 07-31-2018 03:10 PM

I am trying to simulate some MV categorical data using the method introduced by Dr. Wicklin...but the procedure does not run out smoothly as expected. My code are as below, almost copied from the book and changed the function names:

proc iml;
P={.5 .5 .4 .3 .3,
.5 .5 .2 .5 .3,
. . .4 .2 .3,
. . . . .1};
corr = {1 0.5 0.5 0.3 0.2,
0.5 1 0.4 0.3 0.3,
0.5 0.4 1 0.4 0.2,
0.3 0.3 0.4 1 0.2,
0.2 0.3 0.2 0.2 1 };
eig = eigval(corr);
print eig;
/*define the functions. function name "Cat*" are self defined*/

start CatN(p);
return( countn(P, "col") );
finish;
start CatMean(p); /*mean is not meaningful in this sense*/
x = T(1:nrow(p));
return((x#p)[+,]);
finish;
start CatVar(p); /*Var is not meaningful in this sense*/
d = ncol(p);
m = CatMean(p);
x = T(1:nrow(p));
var = j(1, d, 0);
do i = 1 to d;
var[i] = sum((x-m[i])##2 # p[,i]);
end;
return(var);
finish;
start CatCDF(p);
cdf = J(nrow(p), ncol(p));
do i = 1 to ncol(p);
cdf[, i] = cusum(p[,i]);
end;
return(choose(p=., ., cdf));
finish;

descriptive = CatN(p)//CatMean(p)//CatVar(p);
varName = "X0":"X4";
print descriptive[r={"N" "Mean" "Var"} c=varName];

start expand2dgrid(_x, _y);
x = colvec(_x);
y = colvec(_y);
Nx = nrow(_x);
Ny = nrow(_y);
x = repeat(x, Ny);
y = shape(repeat(y, 1, Nx), 0, 1);
return(x||y);
finish;

start CatQuant(p);
n = CatN(p);
CDF = CatCDF(p);
CDF = choose(CDF > 1-2e-6, ., CDF);
quant = quantile("Normal", CDF);
do j = 1 to ncol(p); /* set upper quantile to .I = infinity */
quant[N[j],j] = .I; /* .I has special meaning to BIN func */
end;
return(quant);
finish;

start CatFindroot(p1, p2, target);
n1 = countn(p1);
n2 = countn(p2);
q1 = Catquant(p1);
q2 = Catquant(p2);
v1 = q1[1:n1-1];
v2 = q2[1:n2-1];
g = expand2dgrid(v1, v2); /* find value of rho so that sum(probbnrm(g[,1], g[,2], rho))=target *//* Bisection: find root on bracketing interval [a,b] */
a = -1; b = 1; /* look for correlation in [-1,1] */
dx = le-8;
dy = le-5;
do i = 1 to 100; /* iterate until convergence */
c = (a+b)/2;
Fc = sum(probbnrm(g[,1], g[,2], c)) - target;
if (abs(Fc)<dy) | (b-a)/2<dx then return(c);
Fa = sum(probbnrm(g[,1], g[,2], a)) - target;
if Fa#Fc >0 then a=c;
else b = c;
end;
return(.);
finish;

start CatMVCorr(p, corr);
d = ncol(p);
n = CatN(p);
mean = CatMean(p);
var = CatVar(p);
cdf = CatCDF(p);
r = I(d);
do i = 1 to d-1;
sumCDFi = sum(cdf[1:n[i]-1, i]);
do j = i+1 to d;
sumCDFj = sum(cdf[1:n[j]-1, j]);
hStar = corr[i, j] * sqrt(var[i] * var[j]) + mean[i] * mean[j] - n[i] * n[j] + n[i]*sumCDFj+n[j]*sumCDFi;
r[i,j] = Catfindroot(p[,i], p[j], hStar);
r[j,i] = r[i,j];
end;
end;
return(r);
finish;
R = CatMVCorr(p, corr);
print R;

Here is the log:

1598 proc iml;
NOTE: IML Ready
1599 P={.5 .5 .4 .3 .3,
1600 .5 .5 .2 .5 .3,
1601 . . .4 .2 .3,
1602 . . . . .1};
1603 corr = {1 0.5 0.5 0.3 0.2,
1604 0.5 1 0.4 0.3 0.3,
1605 0.5 0.4 1 0.4 0.2,
1606 0.3 0.3 0.4 1 0.2,
1607 0.2 0.3 0.2 0.2 1 };
1608 eig = eigval(corr);
1609 print eig;
1610 /*define the functions. function name "Cat*" are self defined*/
1611
1612 start CatN(p);
1613 return( countn(P, "col") );
1614 finish;
NOTE: Module CATN defined.
1615 start CatMean(p);
1615! /*mean is not meaningful in this sense*/
1616 x = T(1:nrow(p));
1617 return((x#p)[+,]);
1618 finish;
NOTE: Module CATMEAN defined.
1619 start CatVar(p);
1619! /*Var is not meaningful in this sense*/
1620 d = ncol(p);
1621 m = CatMean(p);
1622 x = T(1:nrow(p));
1623 var = j(1, d, 0);
1624 do i = 1 to d;
1625 var[i] = sum((x-m[i])##2 # p[,i]);
1626 end;
1627 return(var);
1628 finish;
NOTE: Module CATVAR defined.
1629 start CatCDF(p);
1630 cdf = J(nrow(p), ncol(p));
1631 do i = 1 to ncol(p);
1632 cdf[, i] = cusum(p[,i]);
1633 end;
1634 return(choose(p=., ., cdf));
1635 finish;
NOTE: Module CATCDF defined.
1636
1637 descriptive = CatN(p)//CatMean(p)//CatVar(p);
1638 varName = "X0":"X4";
1639 print descriptive[r={"N" "Mean" "Var"} c=varName];
1640
1641 start expand2dgrid(_x, _y);
1642 x = colvec(_x);
1643 y = colvec(_y);
1644 Nx = nrow(_x);
1645 Ny = nrow(_y);
1646 x = repeat(x, Ny);
1647 y = shape(repeat(y, 1, Nx), 0, 1);
1648 return(x||y);
1649 finish;
NOTE: Module EXPAND2DGRID defined.
1650
1651 start CatQuant(p);
1652 n = CatN(p);
1653 CDF = CatCDF(p);
1654 CDF = choose(CDF > 1-2e-6, ., CDF);
1655 quant = quantile("Normal", CDF);
1656 do j = 1 to ncol(p);
1656! /* set upper quantile to .I = infinity */
1657 quant[N[j],j] = .I;
1657! /* .I has special meaning to BIN func */
1658 end;
1659 return(quant);
1660 finish;
NOTE: Module CATQUANT defined.
1661
1662 start CatFindroot(p1, p2, target);
1663 n1 = countn(p1);
1664 n2 = countn(p2);
1665 q1 = Catquant(p1);
1666 q2 = Catquant(p2);
1667 v1 = q1[1:n1-1];
1668 v2 = q2[1:n2-1];
1669 g = expand2dgrid(v1, v2);
1669! /* find value of rho so that sum(probbnrm(g[,1], g[,2],
1669! rho))=target *//* Bisection: find root on bracketing interval [a,b] */
1670 a = -1;
1670! b = 1;
1670! /* look for correlation in [-1,1] */
1671 dx = le-8;
1672 dy = le-5;
1673 do i = 1 to 100;
1673! /* iterate until convergence */
1674 c = (a+b)/2;
1675 Fc = sum(probbnrm(g[,1], g[,2], c)) - target;
1676 if (abs(Fc)<dy) | (b-a)/2<dx then return(c);
1677 Fa = sum(probbnrm(g[,1], g[,2], a)) - target;
1678 if Fa#Fc >0 then a=c;
1679 else b = c;
1680 end;
1681 return(.);
1682 finish;
NOTE: Module CATFINDROOT defined.
1683
1684 start CatMVCorr(p, corr);
1685 d = ncol(p);
1686 n = CatN(p);
1687 mean = CatMean(p);
1688 var = CatVar(p);
1689 cdf = CatCDF(p);
1690 r = I(d);
1691 do i = 1 to d-1;
1692 sumCDFi = sum(cdf[1:n[i]-1, i]);
1693 do j = i+1 to d;
1694 sumCDFj = sum(cdf[1:n[j]-1, j]);
1695 hStar = corr[i, j] * sqrt(var[i] * var[j]) + mean[i] * mean[j] - n[i] * n[j]
1695! + n[i]*sumCDFj+n[j]*sumCDFi;
1696 r[i,j] = Catfindroot(p[,i], p[j], hStar);
1697 r[j,i] = r[i,j];
1698 end;
1699 end;
1700 return(r);
1701 finish;
NOTE: Module CATMVCORR defined.
1702 R = CatMVCorr(p, corr);
ERROR: (execution) Invalid subscript or subscript out of range.

operation : [ at line 1668 column 16
operands : q2, *LIT1027, _TEM1001

q2 1 row 1 col (numeric)

I

*LIT1027 1 row 1 col (numeric)

1

_TEM1001 1 row 1 col (numeric)

0

statement : ASSIGN at line 1668 column 9
traceback : module CATFINDROOT at line 1668 column 9
module CATMVCORR at line 1696 column 17

NOTE: Paused in module CATFINDROOT.
1703 print R;
ERROR: Matrix R has not been set to a value.

statement : PRINT at line 1703 column 5
1704
1705 start RandMVCateg(N, P, corr);
1705! /*the only one that need you to call explicitly to
1705! simulate categorical data*/
1706 /*N Number of desired observations from MV ordinal distribution,
1707 P Matrix of PMF for ordinal vars. The j_th col is the j_th PMF. Use missing vals if some
1707! vars have fewer values than others.
1708 Corr Desired correlation matrix for ordinal variables. Not everymatrix is a valid as the
1708! correlation of ordinal variables. */
1709 d = ncol(p);
1710 c = CatMVCorr(p, corr);
1711 mu = j(1, d, 0);
1712 X = RandNormal(N, mu, c);
1713 N = CatN(p);
1714 quant = CatQuant(p);
1715 do j = 1 to d;
1716 X[,j] = bin(X[,j], .M//quant[1:N[j], j]);
1717 end;
1718 return(X);
1719 finish;
NOTE: Module RANDMVCATEG defined.
1720
1721 /*test the code: simulate 100 observations from MV(associated) Categorical distribution*/
1722 call randseed(1);
1723 X = RandMVCateg(1000, P, corr);
ERROR: Module CATMVCORR called again before exit from prior call.

statement : ASSIGN at line 1710 column 9
traceback : module RANDMVCATEG at line 1710 column 9

NOTE: Paused in module RANDMVCATEG.
1724 print X;
ERROR: Matrix X has not been set to a value.

statement : PRINT at line 1724 column 5

The error messages are in bold so that you can easily find them. I tried with several Corr matrix but none of them worked. I am also confusing why it said module called again before exit. Some extra codes needed in between?

I also have a question about the N in the last module. It is the number of observations to be simulated, but is defined as N = CatN(p); . I am confused.

Rick_SAS · Posted 07-31-2018 03:31 PM

I suggest that you use the code EXACTLY as it is supplied by the book. It works fine on your example. When modifying the code, you must have introduced an error.

proc iml;
load module=_all_;
P={.5 .5 .4 .3 .3,
   .5 .5 .2 .5 .3,
   .  .  .4 .2 .3,
   .  .  .  .  .1};
corr = {1 0.5 0.5 0.3 0.2,
        0.5 1 0.4 0.3 0.3,
        0.5 0.4 1 0.4 0.2,
        0.3 0.3 0.4 1 0.2,
        0.2 0.3 0.2 0.2 1 };
eig = eigval(corr);
print eig;

R = OrdMVCorr(p, corr);
print R; 

call randseed(1);
X = RandMVOrdinal(10, P, corr);
print X;

SAS Output

R
1	0.7070923	0.6673584	0.4129639	0.2682495
0.7070923	1	0.5503235	0.4129639	0.3989868
0.6673584	0.5503235	1	0.5054321	0.2480774
0.4129639	0.4129639	0.5054321	1	0.2392273
0.2682495	0.3989868	0.2480774	0.2392273	1

X
2	2	2	2	3
1	2	3	2	4
2	1	3	2	2
2	2	3	3	2
2	2	3	2	3
2	2	1	1	2
1	2	3	3	1
1	1	1	2	1
1	2	3	2	2
1	2	3	3	4

View solution in original post

Rick_SAS · Posted 07-31-2018 03:31 PM

I suggest that you use the code EXACTLY as it is supplied by the book. It works fine on your example. When modifying the code, you must have introduced an error.

proc iml;
load module=_all_;
P={.5 .5 .4 .3 .3,
   .5 .5 .2 .5 .3,
   .  .  .4 .2 .3,
   .  .  .  .  .1};
corr = {1 0.5 0.5 0.3 0.2,
        0.5 1 0.4 0.3 0.3,
        0.5 0.4 1 0.4 0.2,
        0.3 0.3 0.4 1 0.2,
        0.2 0.3 0.2 0.2 1 };
eig = eigval(corr);
print eig;

R = OrdMVCorr(p, corr);
print R; 

call randseed(1);
X = RandMVOrdinal(10, P, corr);
print X;

SAS Output

R
1	0.7070923	0.6673584	0.4129639	0.2682495
0.7070923	1	0.5503235	0.4129639	0.3989868
0.6673584	0.5503235	1	0.5054321	0.2480774
0.4129639	0.4129639	0.5054321	1	0.2392273
0.2682495	0.3989868	0.2480774	0.2392273	1

X
2	2	2	2	3
1	2	3	2	4
2	1	3	2	2
2	2	3	3	2
2	2	3	2	3
2	2	1	1	2
1	2	3	3	1
1	1	1	2	1
1	2	3	2	2
1	2	3	3	4

wang267 · Posted 07-31-2018 03:52 PM

Thank you so much, Dr. Wicklin. I tried again with copy and paste, and it works well.

simulating MV categorical data, two major errors occurred

Re: simulating MV categorical data, two major errors occurred

Re: simulating MV categorical data, two major errors occurred

Re: simulating MV categorical data, two major errors occurred