BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pap
Calcite | Level 5 pap
Calcite | Level 5

Hi - I need to use IML to create the "original" table ABC from the marginal tables AB (summed over C), AC (summed over B) and BC (summed over A). Below is the relevant part of the code. The full code is attached (adapted from code found at https://lost-contact.mit.edu/afs/enea.it/software/sas/samples/iml/ipf2.sas ). The code runs but the 2 way marginal tables from the computed 3way table do not match the original 2 way input tables. I need help correcting the red colored part of the code below. This should be pretty straight-forward for someone who is familiar with IML.

Thanks for helping. Pap

/****** A has 67 rows

          B has 9 rows

          C has 14 rows ******

*======================================================================;

*    STEP 1: INPUT THE DATA                                            ;

*======================================================================;

/*---TABLE OF A*B */

data ab; input a b wt @@; cards;

.........

.........

.........

;

proc freq; weight wt; table a*b / list; run;  /*** Check AB marginal table ***/

/*---TABLE OF A*C */

data ac; input a c wt @@; cards;

.........

.........

.........

;

proc freq; weight wt; table a*c / list; run;  /*** Check AC marginal table ***/

/*---TABLE OF B*C */

data bc; input b c wt @@; cards;

.........

.........

.........

;

proc freq; weight wt; table b*c / list; run;  /*** Check BC marginal table ***/

proc iml;

/*----COMPUTE JOINT DISTRIBUTION FOR A BY B----*/

use ab; read all into abyb;

ab = shape( abyb[,3], 67, 9);

/*----COMPUTE JOINT DISTRIBUTION FOR A BY C----*/

use ac; read all into abyc;

ac = shape( abyc[,3], 67, 14);

/*----COMPUTE JOINT DISTRIBUTION FOR B BY C----*/

use bc; read all into bbyc;

bc = shape( bbyc[,3], 9, 14);

/*************************************************** FOLLOWING IS WHERE I AM HAVING A PROBLEM MAKING THE 3-WAY TABLE TO TIE TO THE MARGINALS *****************/

/*----COMBINE THE 3 JOINT DISTRIBUTIONS INTO A 3-WAY TABLE----*/

i=1;   

tab = shape(0, 1 , 8442);

    do a = 1 to 67;      /* over a */

       do b = 1 to 9;     /* over b */

          do c = 1 to 14;  /* over c */

tab = min( ab[a,b], ac[a,c], bc[b,c] );

i = i + 1;

ab[a,b] = ab[a,b] - tab;

ac[a,c] = ac[a,c] - tab;

bc[b,c] = bc[b,c] - tab;

  end;

  end;

end;

/**************************************************************************************************************************************************************************************************/

/*---SET UP THE VARIABLE PROFILES AND THE VARIABLE NAMES---*/

r1 = { 1 1 1 1 ..... }; /*** too long to display here ***/

r2 = { 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 ........ }; /*** too long to display here ***/

r3 = { 1 2 3 4 5 6 7 8 9 10 11 12 13 14   1 2 3 4 5 6 7 8 9 10 11 12 13 14  1 2 3 4 5 6 7 8 9 10 11 12 13 14 ...........}; /*** too long to display here ***/

cname = { a b c wt };

matrix = r1 // r2 // r3 // tab;

matrix=matrix`;

/*---OUTPUT TO A SAS DATA SET---*/

create out from matrix[colname=cname];

append from matrix;

quit;

*======================================================================;

*   STEP 3: VERIFY THAT "ORIGINAL" TABLE HAS CORRECT 2-WAY MARGINS     ;

*======================================================================;

proc print data=out; run;

proc freq data=out; weight wt; table a*b a*c b*c / list; run;

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Ah! Thanks for pointing that out. I did not notice that before.

I still don't understand how it all works, but I translated your Excel formulas into SAS/IML and reproduced your computations. The program is below. Maybe it will be useful for you.

Notice that I transposed the definition of the column marginal (blue) and I interlaced the even and odd columns of your 3D matrices because I think that is what the IPF and MARG calls expect. However, I am not an expert on using those calls, so I might have made some mistakes.   I am also not sure how to properly update the 3D matrix. I used the REPEAT and SHAPE functions to get the dimensions to work out, but the code might break if you change from this 3x2x4 example.

Anyway, HTH.

proc iml;
/* 3 x 2 x 4 example */
dim={3 2 4 };
A = {1 2 1,
     5 4 2,
     3 5 5,
     5 5 5,
     6 2 2,
     3 8 7,
     1 7 2,
     2 7 6};

start RowMarginal(dim, A);
   call marg(locmar, m, dim, A, {2,3});
   return( shape(m,dim[3],dim[2]) );
finish;
start ColMarginal(dim, A);
   call marg(locmar, m, dim, A, {1,2});
   return( shape(m,dim[2],dim[1]) );
finish;
start SliceMarginal(dim, A);
   call marg(locmar, m, dim, A, {1,3});
   return( shape(m,dim[3],dim[1]) );
finish;

/* define target marginals */
RowTarg = {9  11,
           17 13,
           19 16,
           7  8};
ColTarg = {22 20 10,
           13 20 15};
SliceTarg = {7  9  4,
             8 12 10,
            15 12  8,
             5  7  3};

B = A;
do iter = 1 to 4;
   print iter;
   /* row adjustment */
   RowFactor =  RowTarg / RowMarginal(dim, B);
   B = B # shape(RowFactor, dim[2]*dim[3]);
   print B[F=5.2];
   /* col adjustment */
   ColFactor =  ColTarg / ColMarginal(dim, B);
   B = B # shape(repeat(ColFactor, dim[3]), 0, dim[1]);
   print B[F=5.2];
   /* slice adjustment */
   SliceFactor =  SliceTarg / SliceMarginal(dim, B);
   B = B # shape(repeat(SliceFactor, 1, dim[2]), 0, dim[1]);
   print B[F=5.2];
end;

View solution in original post

25 REPLIES 25
Rick_SAS
SAS Super FREQ

The documentation for the IPF call has an example that looks similar. Look at the Education-Religion-SelfEsteem example in the section "Generating a Table with Given Marginals": SAS/IML(R) 13.1 User's Guide


pap
Calcite | Level 5 pap
Calcite | Level 5

Hi Rick - I did take a look at that example. It is not the same. If someone can explain the logic of the code in the nested do loops, I can pick it up from there. My problem is lack of familiarity with IML coding.

Thanks. -Pappe

Rick_SAS
SAS Super FREQ

The following two lines can replace the 25,000 lines in your program that you use to define the r1, r2, and r3 vectors:

m = expandgrid(1:67, 1:9, 1:14);              /* create 8442 x 3 matrix of ordered triplets */

r1 = T(m[,1]); r2 = T(m[,2]); r3 = T(m[,3]);

With that change, your PROC IML program fits on one screen.

Here's a reference for the EXPANDGRID function: How to generate a grid of points in SAS - The DO Loop

pap
Calcite | Level 5 pap
Calcite | Level 5

Hi Rick - Thanks for the tip. I really appreciate your help. I tested it got the error: ERROR: Invocation of unresolved module EXPANDGRID. Maybe this is a new function that is not available in my SAS version.

Can you explain the code in the nested do loops that "distributes" the data in the 2-way marginal tables into a 3-way table? That's my critical pain point.

Regards,

Pappe

Rick_SAS
SAS Super FREQ

As the article says, the EXPANDGRID function is in SAS/IML 12.3, which was released with SAS 9.4. However, you can use the DATA step in the same article to still save 25,000 lines of code.

Yes, I know you want someone to explain the nested loops. I don't have the time to look at it today, but if no one else answers I will try to look at it when I get some free time.

pap
Calcite | Level 5 pap
Calcite | Level 5

Thanks. I will look up the article.

Hope someone else answers. If nobody does, I would really appreciate your help today. Thanks for your time again. You are wonderful!

Regards,

Pappe

pap
Calcite | Level 5 pap
Calcite | Level 5

Anyone available to explain the nested do loops that "distributes" the data in the 2-way marginal tables into a 3-way table? It should not take too much time. I am just an IML novice to figure it out. Thank you. -Pappe

Ksharp
Super User

Yeah. It is not too difficult if you are familiar with data step.

tab = shape(0, 1 , 8442);

create a row vector(1*8442) which has 8442 columns and its value all are 0

You can take it as initialize the row vector TAB with value zero.

do a = 1 to 67;      /* over a */

       do b = 1 to 9;     /* over b */

          do c = 1 to 14;  /* over c */

tab = min( ab[a,b], ac[a,c], bc[b,c] );

i = i + 1;

ab[a,b] = ab[a,b] - tab;

ac[a,c] = ac[a,c] - tab;

bc[b,c] = bc[b,c] - tab;

  end;

  end;

end;

It pick up all the combinations of a b c ,and use them as array index.

First loop:  a=1 b=1 c=1 i=1

tab[1]=min(ab[1,1],ac[1,1],bc[1,1]); <--- pick up the minimize value from these three values.and set it into tab[1]

ab[a,b] = ab[a,b] - tab; <--- ab[1,1] is changed by its original value subtract tab[1]. and so on.

Second loop:   a=1 b=1 c=2 i=2

Third :  a=1 b=1 c=3 i=3

..........

a=1 b=1 c=14 i=14

a=1 b=2 c=1 i=15

a=1 b=2 c=2 i=16

............

a=1 b=2 c=14 i=28

.........

and so on.

Xia Keshan

pap
Calcite | Level 5 pap
Calcite | Level 5

Hi Xia - I understood the part you just explained (process of filling values in the do loops). My problem is not getting the resulting 3-way table to yield the original 2-way tables. Are the formulas in the nested do loops correctly specified to generate the right 3-way table?

Regards,

Pappe

Rick_SAS
SAS Super FREQ

I know very little about IPF or the analysis of categorical data. However, it looks like your goal is to create a joint (3-way) distribution that agrees with the three 2-way marginal distributions.  It seems to me that this is a difficult thing to do, and it is not clear to me that the algorithm in the IPF documentation will generalize to three 2-way marginals.  Notice that the examples in the IPF doc all include a 1-way marginal.  The process of forming a 3-way joint distribution that agrees with 1-way marginal distributions is a much easier problem because there are fewer constraints on the joint distribution.  I think I can understand why the "greedy algorithm" in the documentation works for 1-way marginals, but I don't see why it would have to work with  higher-dimensional marginals.

It seems to me that this is a statistical question, not a programming question. Given three 2-way marginal tables, is there an algorithm that can construct a 3-way frequency table whose 2-way marginals are EXACTLY as given?  I did an internet search but I was not able to determine the answer.

Ksharp
Super User

Hi pap,

As Rick point out , It seems that greedy algorithm (i.e. enumerate) could give you an answer. I was trying to solve a system of linear equations . But the matrix is singular . So it could not be worked out .

For example :

we have a two-way matrix :

x11 x12

x21 x22

x31 x32

and we know its row sum {5,5,7} and column sum {8 9} . and want know the value of this matrix .I brute  force to enumerate every value of x11,x12,x21,x22,x31,x32  . and got this .

proc iml;
a={1 1 0 0 0 0,
   0 0 1 1 0 0,
   0 0 0 0 1 1,
   1 0 1 0 1 0,
   0 1 0 1 0 1};
row_sum={5,5,7};
col_sum={8 9};
x=j(6,1,0) ;
do x11=1 to min(row_sum[1],col_sum[1]) ;
 do x12=1 to  min(row_sum[1],col_sum[2]) ;
  do x21=1 to  min(row_sum[2],col_sum[1]) ;
   do x22=1 to  min(row_sum[2],col_sum[2]) ;
     do x31=1 to  min(row_sum[3],col_sum[1]) ;
        do x32=1 to  min(row_sum[3],col_sum[2]) ;
         x[1]=x11;x[2]=x12;x[3]=x21;x[4]=x22;x[5]=x31;x[6]=x32;
         if a*x={5,5,7,8,9} then do;
            names={x11,x12,x21,x22,x31,x32} ;
          print x[rowname=names]; stop;
        end;
       end;
     end;
   end;
  end;
 end;
end;
quit;

1 4
1 4
6 1



But I also find another two matrix can suit the same conditions.

1 4
2 3
5 2
8 9


2 3
4 1
2 5
8 9

That means that could be more than one solutions.




Xia Keshan

Message was edited by: xia keshan

pap
Calcite | Level 5 pap
Calcite | Level 5

Thanks Rick and Xia - There is a programming solution (http://www.demog.berkeley.edu/~eddieh/datafitting.html). It consists of iteratively adjusting the cells to equal to marginal totals (one dimension at a time: divide cells by current marginal total and multiply by target marginal total). Below is the description for a 2D and 3D problem. I also including the SAS Macro solution for the 2D problem. Can you help adjust it to work for the 3D problem?

Description of 2D solution

  • Step 1: Each row of seed cells was proportionally adjusted to  equal the marginal row totals (specifically, each cell was divided  by the actual sum of the row of cells, then multiplied by the marginal row total).
  • Step 2: Each column of (already row-adjusted) cells was proportionally adjusted to equal the marginal column totals.

This is the end of the first ‘Iteration’.

  • Steps 3 and higher: The above steps were repeated until the selected level of convergence was reached

SAS macro code for the 2D problem:

/*** -*-mode:simple-sas-*- **************************************************************

  

    DESCRIPTION: IPF2WAY rakes the data in dataset &DSIN. to the row and column controls contained in datasets &ROWCTRLDS. and &COLCTRLDS., respectively.  It does so using the two-way rake also known as Iterative Proportional Fitting and the RAS algorithm.

  

    PROGRAMMERS: webb.sprague@ofm.wa.gov, inspired by Chuck Taylor at the US Census Bureau  ---  DATE STARTED: 2011-08-30

    INPUT (DATASETS, NAMES, ETC):

  DSIN:        Input dataset

  VAR:         Variables to be raked

  ROWCTRLDS:   Row control dataset

  ROWCTRLVAR:  Row control variable (single variable: all controls are in one column)

  COLCTRLDS:   Column control dataset

  COLCTRLVAR:  Column control variables

    OUTPUT  (DATASETS, NAMES, ETC):

  DSOUT:       Output dataset  

**********************************************************************************/

%macro IPF2WAY(DSIN=,DSOUT=,ROWCTRLDS=,VAR=,ROWCTRLVAR=,COLCTRLDS=,COLCTRLVAR=);

  proc iml;

  /* tolerance at which to stop iterating -- half a person ... */

  TOL=0.5;

  /* Read in data, row controls, and column controls.  xxxVAR parameters control which columns get fed into the various matrices */


  use &DSIN.;

  read ALL var {&VAR.} into MATRIX;

  use &ROWCTRLDS.;

  read ALL var {&ROWCTRLVAR.} into ROWCTRL;

  use &COLCTRLDS.;

  read current var {&COLCTRLVAR.} into COLCTRL;

  /* use _MATRIX for intermediate and final results, keep original in MATRIX */

  _MATRIX = MATRIX;

  /* Main loop ... */

  /* ...initialize DIFF to large value to force execution of do loop */

  DIFF = 1000;

  /* ... loop until finished. */

  do until (DIFF < TOL);

  _OLDMAT  =  _MATRIX;

  ROWPR    =  _MATRIX / _MATRIX[,+]; /* divide matrix row-wise by rowsum to get controlled proportions*/

  _MATRIX  =  ROWCTRL # ROWPR;   /* multiply controls by proportion to get numbers*/

  COLPR    =  _MATRIX / _MATRIX[+,]; /* ditto with columns */

  _MATRIX  =  COLCTRL # COLPR;

  DIFF     =  max(abs(_MATRIX - _OLDMAT)); /* check if want to end loop */

  end;

  /* output to a new dataset (XXX need to retain key vars for merging but dont - see note.) */

  create &DSOUT. from _MATRIX [colname={&VAR}];

  append from _MATRIX;

  quit;

%mend;

Description of 3D solution


First, just as in two-dimensional IPF, you will need all of the marginals (e.g. Age by Sex, Age by Race, and Sex by Race), and each marginal must sum to the same value. You must also be sure that the dimensions of the marginals are consistent (e.g. that the values for the age groups are consistent). Second, you will need a three-dimensional (e.g. Age by Sex by Race) seed. You will then perform steps that are very similar to the two-dimensional case...

  • Step 1: Proportionally adjust each (two-dimensional) row of cells to equal the pre-determined totals of Marginal 1.
  • Step 2: Proportionally adjust each column of cells to equal the pre-determined totals of Marginal 2.
  • Steps 3: Proportionally adjust each slice of cells to equal the pre-determined totals of Marginal 3.

This is the end of the first ‘Iteration’.

  • Steps 4 and higher: Repeat the above steps until the desired level of convergence is reached.
Ksharp
Super User

I also can do it for three-way table . But you should know there could be more than one solution.

pap
Calcite | Level 5 pap
Calcite | Level 5

Hi Xia - That's fine. I just want any one solution, as long as it is consistent with the known data in the 3 two-way tables. Thanks for all your help. Would it be possible to send it to me today?

Regards,

-Pappe

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 25 replies
  • 2487 views
  • 0 likes
  • 3 in conversation