BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ksharp
Super User

Ou, Sorry. I just review the link you post. It appeared that is totally different with my thought. I would think about it when I am free.Can't you use the code you find to get what you want ?

/*** -*-mode:simple-sas-*- **************************************************************  DESCRIPTION: IPF2WAY rakes the data in dataset &DSIN. to the row and column controls contained in datasets &ROWCTRLDS. and &COLCTRLDS., respectively.  It does so using the two-way rake also known as Iterative Proportional Fitting and the RAS algorithm.  PROGRAMMERS: webb.sprague@ofm.wa.gov, inspired by Chuck Taylor at the US Census Bureau  DATE STARTED: 2011-08-30  INPUT (DATASETS, NAMES, ETC): DSIN:        Input dataset VAR:         Variables to be raked ROWCTRLDS:   Row control dataset ROWCTRLVAR:  Row control variable (single variable: all controls are in one column) COLCTRLDS:   Column control dataset COLCTRLVAR:  Column control variables  OUTPUT  (DATASETS, NAMES, ETC): DSOUT:       Output dataset  NOTES:  2011-08-31 WS (WARNING): This code has only been tested on a single set of dummy data (below).  Be careful when applying to real data!  2011-08-30 WS (TODO): This macro doesn't have the ID keys functionality of the original one.  Have to merge horizontally, should add in the future...  2011-08-30 WS (TODO): Needs functionality to set zeros to really small numbers (internally defined, I think).  2011-08-30 WS: The algorithm here was coded using Eddie H's presentation on IPF. The original inspiration was from the Census Bureau (2004-ish), but it didn't work except for the stuff to interact with outside datasets.  **********************************************************************************/ %macro IPF2WAY(DSIN=,DSOUT=,ROWCTRLDS=,VAR=,ROWCTRLVAR=,COLCTRLDS=,COLCTRLVAR=); proc iml;  /* tolerance at which to stop iterating -- half a person ... */ TOL=0.5;  /* Read in data, row controls, and column controls.  xxxVAR parameters control which columns get fed into the various matrices */  use &DSIN.; read ALL var {&VAR.} into MATRIX;  use &ROWCTRLDS.; read ALL var {&ROWCTRLVAR.} into ROWCTRL;  use &COLCTRLDS.; read current var {&COLCTRLVAR.} into COLCTRL;  /* use _MATRIX for intermediate and final results, keep original in MATRIX */ _MATRIX = MATRIX;  /* Main loop ... */ /* ...initialize DIFF to large value to force execution of do loop */ DIFF = 1000; /* ... loop until finished. */ do until (DIFF < TOL); _OLDMAT  =  _MATRIX; ROWPR    =  _MATRIX / _MATRIX[,+]; /* divide matrix row-wise by rowsum to get controlled proportions*/ _MATRIX  =  ROWCTRL # ROWPR;        /* multiply controls by proportion to get numbers*/ COLPR    =  _MATRIX / _MATRIX[+,]; /* ditto with columns */ _MATRIX  =  COLCTRL # COLPR; DIFF     =  max(abs(_MATRIX - _OLDMAT)); /* check if want to end loop */ end;  /* output to a new dataset (XXX need to retain key vars for merging but dont - see note.) */ create &DSOUT. from _MATRIX [colname={&VAR}]; append from _MATRIX; quit;  %mend;   /********************************************************************************* Test code. **********************************************************************************/ %MACRO _TEST;  proc sql; create table test_dsin (idstr varchar, x1 numeric, x2 numeric, x3 numeric); insert into test_dsin values ('foo1', 1, 2, 1); insert into test_dsin values ('foo2', 3, 5, 5); insert into test_dsin values ('foo3',  6, 2, 2);  create table test_rowctrlds (rt numeric); insert into test_rowctrlds values (5); insert into test_rowctrlds values (15); insert into test_rowctrlds values (8);  create table test_colctrlds (ct1 numeric, ct2 numeric, ct3 numeric); insert into test_colctrlds values (11, 9, 8); quit;  %_RAKE2WAYS(DSIN=test_dsin, DSOUT=test_dsout, VAR=x1 x2 x3, ROWCTRLDS=test_rowctrlds, ROWCTRLVAR=rt, COLCTRLDS=test_colctrlds, COLCTRLVAR=ct1 ct2 ct3);  %MEND; 
pap
Calcite | Level 5 pap
Calcite | Level 5

Hi Xia - I am new to IML and need some help with the coding. I am trying. Thanks. -Pappe

Ksharp
Super User

Hi. pap

Actually I am also a rookie of IML . If I have some time, I will review it .But now Rick could help you , once he saw the code you posted .

pap
Calcite | Level 5 pap
Calcite | Level 5

Attached is an Excel solution for the 3D problem (based on an example). I would use Excel if I did not have to solve this problem for 50 three-way tables (50 US states) and the levels of the variables being large and different across states. I think a SAS macro using IML is the best tool for this.

Thanks. -Pappe

Rick_SAS
SAS Super FREQ

Maybe I am confused, but I thought your problem was that you want to construct a 3D table that has given 2D marginals? It looks like the Excel spreadsheet starts with a 3D table (the red numbers) and then applies IPF in order to adjust the frequencies to fit the 2D marginal distributions.  Thus the spreadsheet merely demonstrates the IPF algorithm.  The spreadsheet doesn't demonstrate how to oobtain the red numbers in the first place.

Similarly, the Hunsinger PPT and macro that you link to has the statement "you will need a three-dimensional (e.g. Age by Sex by

Race) seed."  The "seed" means that you have to START with a 3D matrix. The IPF algoithm then adjusts the matrix.

In short, I still don't know how to create a 3D frequency table that has given 2D marginals.

pap
Calcite | Level 5 pap
Calcite | Level 5

Hi Rick - In the Excel spreadsheet, the starting 3D table is merely a seed. It can be populated with 1s and 0s (where marginal sum to 0). I actually built the solution in Excel. A bit tedious, but I had to. Thanks again to you and Xia for spending your valuable time to help me. I really appreciate it.

Regards,

Pappe

Rick_SAS
SAS Super FREQ

Ah! Thanks for pointing that out. I did not notice that before.

I still don't understand how it all works, but I translated your Excel formulas into SAS/IML and reproduced your computations. The program is below. Maybe it will be useful for you.

Notice that I transposed the definition of the column marginal (blue) and I interlaced the even and odd columns of your 3D matrices because I think that is what the IPF and MARG calls expect. However, I am not an expert on using those calls, so I might have made some mistakes.   I am also not sure how to properly update the 3D matrix. I used the REPEAT and SHAPE functions to get the dimensions to work out, but the code might break if you change from this 3x2x4 example.

Anyway, HTH.

proc iml;
/* 3 x 2 x 4 example */
dim={3 2 4 };
A = {1 2 1,
     5 4 2,
     3 5 5,
     5 5 5,
     6 2 2,
     3 8 7,
     1 7 2,
     2 7 6};

start RowMarginal(dim, A);
   call marg(locmar, m, dim, A, {2,3});
   return( shape(m,dim[3],dim[2]) );
finish;
start ColMarginal(dim, A);
   call marg(locmar, m, dim, A, {1,2});
   return( shape(m,dim[2],dim[1]) );
finish;
start SliceMarginal(dim, A);
   call marg(locmar, m, dim, A, {1,3});
   return( shape(m,dim[3],dim[1]) );
finish;

/* define target marginals */
RowTarg = {9  11,
           17 13,
           19 16,
           7  8};
ColTarg = {22 20 10,
           13 20 15};
SliceTarg = {7  9  4,
             8 12 10,
            15 12  8,
             5  7  3};

B = A;
do iter = 1 to 4;
   print iter;
   /* row adjustment */
   RowFactor =  RowTarg / RowMarginal(dim, B);
   B = B # shape(RowFactor, dim[2]*dim[3]);
   print B[F=5.2];
   /* col adjustment */
   ColFactor =  ColTarg / ColMarginal(dim, B);
   B = B # shape(repeat(ColFactor, dim[3]), 0, dim[1]);
   print B[F=5.2];
   /* slice adjustment */
   SliceFactor =  SliceTarg / SliceMarginal(dim, B);
   B = B # shape(repeat(SliceFactor, 1, dim[2]), 0, dim[1]);
   print B[F=5.2];
end;

pap
Calcite | Level 5 pap
Calcite | Level 5

Hi Rick - This is great! I will adapt your code to my problem. Thank you so much for your help and for teaching us to program in IML elegantly.

Regards,

-Pappe

Rick_SAS
SAS Super FREQ

You are welcome! Sorry this took so long.  I had wrongly assumed that you wanted a 3D frequency matrix that consisted of integers. I did not think to ask whether a decimal matrix would be sufficient. That greatly simplifies the problem.

pap
Calcite | Level 5 pap
Calcite | Level 5

Thanks Rick - I am doing the rounding afterwards. Not optimal, but I took too much of time already. Thanks again. -Pappe

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 25 replies
  • 2648 views
  • 0 likes
  • 3 in conversation