- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks to Danny for this wonderful resource on how to score a fixed effects logistic Bayesian model:
https://communities.sas.com/t5/SAS-Communities-Library/Bayesian-Analysis-and-PROC-IML-What-s-the-Sco....
I need to take this one step further and add random intercepts to the process. My IML skills need serious work so I had to take the output of the fixed effect scoring out of PROC IML, transpose them to create one tall file and then merge it with the Iteration by RandomIntercept matrix in the OutPost file from MCMC/BGLIMMIX. I use vValueX() function to match each Level 2 ID to the corresponding column in the Random Intercept dataset.
This workflow is straightforward with relatively light code. Maybe someone can use it as is. For my purposes it's too SLOW! I am working with over 1,000 Level 2 units containing over 500 Level 1 Units each and 5,000 iterations seems like a minimum for Bayesian output. Marginal Linear Predictor (mLP) takes seconds to compute even for this large dataset in PROC IML. Adding Random intercepts to each mLP estimate to obtain Conditional Linear Predictors takes hours. I am in an evironment where I need to do this on a regular basis for dozens of models so the time adds up very quickly.
I am hoping some good soul can translate what I have done with transpose/merge into PROC IML and that matrix addition can shave serious time off the process. Thanks in advance.
Simulated source data files and the workflow to add ConditionalLP = mLP + RI_L2 are in the code below. L12 dataset is what comes out from the scoring effort in the blog reference above (minus the L2 unit IDs).
%let nL2 = 10; * Level 2 units ;
%let nL1 = 20; * L1 units per L2 unit on average ;
%let nIter = 50; * MCMC iterations ;
data L12;
do L2ID=1 to &nL2;
do L1ID=1 to ceil(RanUni(1)*&nL1*2); * simulate variable number of L1 units per L2 unit ;
array mLR_Iter[&nIter]; do i=1 to &nIter; mLR_Iter[i] = RanNor(1); end;
output;
end;
end;
drop i;
data OutPost; * Random Intercepts: one row per MCMC Iteration, one column per L2 unit ;
do Iteration=1 to &nIter;
array RI_L2[&nL2]; do i=1 to &nL2; RI_L2[i] = RanNor(1);end;
output;
end;
drop i;
run;
*** 2. TRANSPOSE PT DATA to TALL AND MERGE WITH RANDOM INTERCEPTS ;
proc transpose data=L12 out=L12_Tall;
by L2ID L1ID;
var mLR_Iter:;
proc sql;
create table L12_Tall as
select input(substr(_Name_,9),best.) as Iteration, /* extract Iteration ID that starts with character 9 */
L2ID, L1ID, Col1 as mLP /* marginal linear predictor */
from L12_Tall
order by Iteration, L2ID, L1ID
;
quit;
data WantM;
merge L12_Tall
OutPost;
by Iteration;
ConditionalLP = mLP + vValueX(cats('RI_L2',L2ID));
drop RI_L2:;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tried this with slightly higher Level 2, Level 1 and Iteration numbers. (400, 200, 700).
It works quite fast when I compared its cpu time, real-time, memory, etc. to the same numbers in the log of your provided version.
%let nL2 = 400; /* Number of level-2 units */
%let nL1 = 200; /* Maximum level-1 units per level-2 */
%let nIter = 700; /* Number of simulation iterations */
/* Simulate hierarchical data structure with random intercepts */
data L12;
do L2ID=1 to &nL2;
do L1ID=1 to ceil(RanUni(1)*&nL1*2);
array mLR_Iter[&nIter]; /* Array for storing marginal predictors */
do i=1 to &nIter; mLR_Iter[i] = RanNor(1); end;
output;
end;
end;
drop i;
run;
/* Simulate posterior distribution of random intercepts */
data OutPost;
do Iteration=1 to &nIter;
array RI_L2[&nL2]; /* Array for random intercept values */
do i=1 to &nL2; RI_L2[i] = RanNor(1); end;
output;
end;
drop i;
run;
proc iml;
/* 1. Import random intercept matrix */
use OutPost;
read all var _NUM_ into RI_Matrix[colname=varNames];
close OutPost;
RI_Matrix = RI_Matrix[, 2:ncol(RI_Matrix)]; /* Remove iteration counter column */
/* 2. Import observed data and predictors */
use L12;
read all var {'L2ID', 'L1ID'} into IDs; /* Group/unit identifiers */
read all var _NUM_ into L12_Data; /* Full dataset with predictors */
close L12;
/* 3. Initialize dimensions and ID vectors */
nIter = nrow(RI_Matrix); /* Total MCMC iterations */
nObs = nrow(IDs); /* Total observations in dataset */
L2ID = IDs[,1]; /* Level 2 group membership */
L1ID = IDs[,2]; /* Level 1 unit identifiers */
/* 4. Initialize output dataset */
create WantIML var {Iteration L2ID L1ID mLP ConditionalLP};
/* 5. Stream processing by iteration */
do i = 1 to nIter; /* Memory-efficient iteration loop */
mLP = L12_Data[, 2+i]; /* Extract predictors for current iteration */
RI_i = RI_Matrix[i, ]; /* Current iteration's random effects */
ConditionalLP = mLP + RI_i[L2ID]; /* Calculate conditional linear predictor */
Iteration = j(nObs, 1, i); /* Create iteration identifier vector */
append var {Iteration L2ID L1ID mLP ConditionalLP}; /* Write current iteration data */
end;
close WantIML;
quit;
However I tried to execute the code with the numbers you mentioned initially (over 1,000 Level 2 units containing over 500 Level 1 Units each and 5,000), and it failed, because of the lack of memory (I have tested this code in SAS OnDemand)
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 %let nL2 = 1000; * Level 2 units;
70 %let nL1 = 500; * L1 units per L2 unit on average ;
71 %let nIter = 5000; * MCMC iterations;
72 data L12;
73 do L2ID=1 to &nL2;
74 do L1ID=1 to ceil(RanUni(1)*&nL1*2); * simulate variable number of L1 units per L2 unit ;
75 array mLR_Iter[&nIter]; do i=1 to &nIter; mLR_Iter[i] = RanNor(1); end;
76 output;
77 end;
78 end;
79 drop i;
ERROR: Insufficient space in file WORK.L12.DATA.
ERROR: File WORK.L12.DATA is damaged. I/O processing did not complete.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.L12 may be incomplete. When this step was stopped there were 270315 observations and 5002 variables.
ERROR: Insufficient space in file WORK.L12.DATA.
NOTE: DATA statement used (Total process time):
real time 1:15.32
user cpu time 1:01.92
system cpu time 4.56 seconds
memory 5080.40k
OS Memory 25552.00k
Timestamp 02/02/2025 02:58:51 AM
Step Count 31 Switch Count 42
Page Faults 0
Page Reclaims 1113
Page Swaps 0
Voluntary Context Switches 24592
Involuntary Context Switches 1937
Block Input Operations 0
Block Output Operations 23066656
80 data OutPost; * Random Intercepts: one row per MCMC Iteration, one column per L2 unit ;
81 do Iteration=1 to &nIter;
82 array RI_L2[&nL2]; do i=1 to &nL2; RI_L2[i] = RanNor(1);end;
83 output;
84 end;
85 drop i;
86 run;
ERROR: An I/O error has occurred on file WORK.OUTPOST.DATA.
ERROR: UNIX errno = 122.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.OUTPOST was only partially opened and will not be saved.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1299.15k
OS Memory 22944.00k
Timestamp 02/02/2025 02:58:51 AM
Step Count 32 Switch Count 0
Page Faults 0
Page Reclaims 29
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
87
88 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
98
Hope I could be helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tried this with slightly higher Level 2, Level 1 and Iteration numbers. (400, 200, 700).
It works quite fast when I compared its cpu time, real-time, memory, etc. to the same numbers in the log of your provided version.
%let nL2 = 400; /* Number of level-2 units */
%let nL1 = 200; /* Maximum level-1 units per level-2 */
%let nIter = 700; /* Number of simulation iterations */
/* Simulate hierarchical data structure with random intercepts */
data L12;
do L2ID=1 to &nL2;
do L1ID=1 to ceil(RanUni(1)*&nL1*2);
array mLR_Iter[&nIter]; /* Array for storing marginal predictors */
do i=1 to &nIter; mLR_Iter[i] = RanNor(1); end;
output;
end;
end;
drop i;
run;
/* Simulate posterior distribution of random intercepts */
data OutPost;
do Iteration=1 to &nIter;
array RI_L2[&nL2]; /* Array for random intercept values */
do i=1 to &nL2; RI_L2[i] = RanNor(1); end;
output;
end;
drop i;
run;
proc iml;
/* 1. Import random intercept matrix */
use OutPost;
read all var _NUM_ into RI_Matrix[colname=varNames];
close OutPost;
RI_Matrix = RI_Matrix[, 2:ncol(RI_Matrix)]; /* Remove iteration counter column */
/* 2. Import observed data and predictors */
use L12;
read all var {'L2ID', 'L1ID'} into IDs; /* Group/unit identifiers */
read all var _NUM_ into L12_Data; /* Full dataset with predictors */
close L12;
/* 3. Initialize dimensions and ID vectors */
nIter = nrow(RI_Matrix); /* Total MCMC iterations */
nObs = nrow(IDs); /* Total observations in dataset */
L2ID = IDs[,1]; /* Level 2 group membership */
L1ID = IDs[,2]; /* Level 1 unit identifiers */
/* 4. Initialize output dataset */
create WantIML var {Iteration L2ID L1ID mLP ConditionalLP};
/* 5. Stream processing by iteration */
do i = 1 to nIter; /* Memory-efficient iteration loop */
mLP = L12_Data[, 2+i]; /* Extract predictors for current iteration */
RI_i = RI_Matrix[i, ]; /* Current iteration's random effects */
ConditionalLP = mLP + RI_i[L2ID]; /* Calculate conditional linear predictor */
Iteration = j(nObs, 1, i); /* Create iteration identifier vector */
append var {Iteration L2ID L1ID mLP ConditionalLP}; /* Write current iteration data */
end;
close WantIML;
quit;
However I tried to execute the code with the numbers you mentioned initially (over 1,000 Level 2 units containing over 500 Level 1 Units each and 5,000), and it failed, because of the lack of memory (I have tested this code in SAS OnDemand)
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 %let nL2 = 1000; * Level 2 units;
70 %let nL1 = 500; * L1 units per L2 unit on average ;
71 %let nIter = 5000; * MCMC iterations;
72 data L12;
73 do L2ID=1 to &nL2;
74 do L1ID=1 to ceil(RanUni(1)*&nL1*2); * simulate variable number of L1 units per L2 unit ;
75 array mLR_Iter[&nIter]; do i=1 to &nIter; mLR_Iter[i] = RanNor(1); end;
76 output;
77 end;
78 end;
79 drop i;
ERROR: Insufficient space in file WORK.L12.DATA.
ERROR: File WORK.L12.DATA is damaged. I/O processing did not complete.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.L12 may be incomplete. When this step was stopped there were 270315 observations and 5002 variables.
ERROR: Insufficient space in file WORK.L12.DATA.
NOTE: DATA statement used (Total process time):
real time 1:15.32
user cpu time 1:01.92
system cpu time 4.56 seconds
memory 5080.40k
OS Memory 25552.00k
Timestamp 02/02/2025 02:58:51 AM
Step Count 31 Switch Count 42
Page Faults 0
Page Reclaims 1113
Page Swaps 0
Voluntary Context Switches 24592
Involuntary Context Switches 1937
Block Input Operations 0
Block Output Operations 23066656
80 data OutPost; * Random Intercepts: one row per MCMC Iteration, one column per L2 unit ;
81 do Iteration=1 to &nIter;
82 array RI_L2[&nL2]; do i=1 to &nL2; RI_L2[i] = RanNor(1);end;
83 output;
84 end;
85 drop i;
86 run;
ERROR: An I/O error has occurred on file WORK.OUTPOST.DATA.
ERROR: UNIX errno = 122.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.OUTPOST was only partially opened and will not be saved.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 1299.15k
OS Memory 22944.00k
Timestamp 02/02/2025 02:58:51 AM
Step Count 32 Switch Count 0
Page Faults 0
Page Reclaims 29
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
87
88 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
98
Hope I could be helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Brilliant! Thanks so much. Will test the performance of this code.
In the meantime, I was able to shave off more than 30% of execution time by (1) removing the use of vValueX() lookup and (2) eliminating interim PROC SORTs. I replaced the former with TRANSPOSE step for Random Intercepts and the latter with a PROC SQL step for merging instead of DATA step. A lot less code too:
*** 2.a TRANSPOSE PT DATA to TALL AND MERGE WITH FLAT RANDOM INTERCEPTS ;
proc transpose data=Pt out=Pt_T( rename=(Col1=mLP) ); by ParticID PtID; var mLR_Iter:;
*** 2.b TRANSPOSE RI DATA to TALL AND MERGE WITH TALL PATIENTS ;
proc transpose data=RI out=RI_T( rename=(Col1=RI) ); by Iteration; var RI:;
*** 3. NO-SORT SQL MEGE ;
proc sql; create table WantM3 as
select p.ParticID, PtID, Iteration, mLP, RI, mLP+RI as cLP
from Pt_T p
left join RI_T r ON input(substr(p._NAME_,9),best.) = r.Iteration
AND p.ParticID = input(substr(r._NAME_,7),best.)
order by p.ParticID, PtID, Iteration
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Benchmarking results are in:
1. My original code ran just a tad over two hours
2. My new and improved code without vValueX() or sorting improved significantly to 1.5 hours.
3. PROC IML method recommended above blew them both out of the water completing the task in 24.5 minutes!
Thanks for playing!