Proc GBLIMM, outpost data set question

gp4 · Posted 01-21-2024 06:18 PM

The dependent variable, Y, is a lab value, LID is a laboratory ID, SampleID is specimen ID. A set of identical samples are sent to each lab for analysis. I am using bglimm to calculate a Bayesian intraclass correlation with HPDR for the ICC. That is working fine and the problem description is just for background.

My question is about the outpost data set. Here is the code.

proc bglimm data=CPT2 seed= 202011 nthreads= -1 nbi=1000 nmc= 5000 outpost= YPost;

/* stats= all plots= (trace autocorr density) diagnostics=all ; */
class SampleID LID;
model Y = / dist=normal;
random SampleID/ covprior=uniform(lower=0,upper=10) nuts ;
random LID/ covprior=uniform(lower=0,upper=10) nuts;
run;

The outpost dataset, named YPost has everything we expect -- posterior values for each parameter for each iteration, but also has six additional columns named Y_82, Y_98, Y_118, Y_138, Y_149, &_158. The analysis is working ok, I am getting what I need out of it, but I would like to understand what these added columns are.

Rick_SAS · Posted 01-22-2024 10:30 AM

I talked to a colleague who has more experience with PROC BGLIMM. He pointed me to the section of the documentation about how the procedure handles missing response values: SAS Help Center: Missing Data

It appears that your data has missing responses for observations 82, 98, 118, 138, 149, and 158. The columns in the OUTPOST= data set show the generated values for those observations for each of the 5,000 Monte Carlo iterations (since NMC=5000).

By default, the procedure fills in missing value by using the model. If you don't want that, the MISSING=CC option enables you to use complete cases and drop observations that have one or more missing values.

The following statements are from the Getting Started example, but I have changed two response values to missing values (Obs=11 and Obs=28). When I run the program, the OUTPOST= data set contains the variables SideEffect_11 and SideEffect_28. You can draw histograms of those variables to see the 5000 imputed values.

data MultiCenter;
   input Center Group$ N SideEffect @@;
   datalines;
 1  A  32  14   1  B  33  18
 2  A  30   4   2  B  28   8
 3  A  23  14   3  B  24   9
 4  A  22   7   4  B  22  10
 5  A  20   6   5  B  21  12
 6  A  19   .   6  B  20   3
 7  A  17   2   7  B  17   6
 8  A  16   7   8  B  15   9
 9  A  13   1   9  B  14   5
10  A  13   3  10  B  13   1
11  A  11   1  11  B  12   2
12  A  10   1  12  B   9   0
13  A   9   2  13  B   9   6
14  A   8   1  14  B   8   .
15  A   7   1  15  B   8   0
;

proc bglimm data=MultiCenter nmc=10000 thin=2 seed=976352 plots=all outpost= YPost; 
   class Center Group;
   model SideEffect/N = / dist=normal;
   random int / subject = Center;
run;

/* plot the imputed values for missing responses */
title "Imputation of Missing Value for 11th Obs";
proc sgplot data=YPost;
   histogram SideEffect_11;
run;
title "Imputation of Missing Value for 28th Obs";
proc sgplot data=YPost;
   histogram SideEffect_28;
run;

Try adding the MISSING=CC option and rerunning. The procedure now only uses 28 obs instead of 30 obs. The OUTPOST= data set no longer includes the SideEffect_11 and SideEffect_28 columns.

gp4 · Posted 01-22-2024 03:25 PM

Now that I know I feel like I should have seen that! Thank you.

Proc GBLIMM, outpost data set question

Re: Proc GBLIMM, outpost data set question

Re: Proc GBLIMM, outpost data set question

Proc GBLIMM, outpost data set question

Re: Proc GBLIMM, outpost data set question

Re: Proc GBLIMM, outpost data set question

Ready to join fellow brilliant minds for the SAS Hackathon?