BookmarkSubscribeRSS Feed
gp4
Fluorite | Level 6 gp4
Fluorite | Level 6

The dependent variable, Y, is a lab value, LID is a laboratory ID, SampleID is specimen ID.  A set of identical samples are sent to each lab for analysis.  I am using bglimm to calculate a Bayesian intraclass correlation with HPDR for the ICC.  That is working fine and the problem description is just for background.  

 

My question is about the outpost data set. Here is the code. 

 

proc bglimm data=CPT2 seed= 202011 nthreads= -1 nbi=1000 nmc= 5000  outpost= YPost

    /* stats= all plots= (trace autocorr density) diagnostics=all ; */
class SampleID LID;
model Y = / dist=normal;
random SampleID/ covprior=uniform(lower=0,upper=10) nuts ;
random LID/ covprior=uniform(lower=0,upper=10) nuts;
run;   

 

The outpost dataset, named YPost has everything we expect -- posterior values for each parameter for each iteration, but also has six additional columns named Y_82, Y_98, Y_118, Y_138, Y_149, &_158.  The analysis is working ok, I am getting what I need out of it, but I would like to understand what these added columns are.

 

 

2 REPLIES 2
Rick_SAS
SAS Super FREQ

I talked to a colleague who has more experience with PROC BGLIMM. He pointed me to the section of the documentation about how the procedure handles missing response values: SAS Help Center: Missing Data

 

It appears that your data has missing responses for observations 82, 98, 118, 138, 149, and 158.  The columns in the OUTPOST= data set show the generated values for those observations for each of the 5,000 Monte Carlo iterations (since NMC=5000).

 

By default, the procedure fills in missing value by using the model. If you don't want that, the MISSING=CC option enables you to use complete cases and drop observations that have one or more missing values.

 

The following statements are from the Getting Started example, but I have changed two response values to missing values (Obs=11 and Obs=28). When I run the program, the OUTPOST= data set contains the variables SideEffect_11 and SideEffect_28. You can draw histograms of those variables to see the 5000 imputed  values.

 

data MultiCenter;
   input Center Group$ N SideEffect @@;
   datalines;
 1  A  32  14   1  B  33  18
 2  A  30   4   2  B  28   8
 3  A  23  14   3  B  24   9
 4  A  22   7   4  B  22  10
 5  A  20   6   5  B  21  12
 6  A  19   .   6  B  20   3
 7  A  17   2   7  B  17   6
 8  A  16   7   8  B  15   9
 9  A  13   1   9  B  14   5
10  A  13   3  10  B  13   1
11  A  11   1  11  B  12   2
12  A  10   1  12  B   9   0
13  A   9   2  13  B   9   6
14  A   8   1  14  B   8   .
15  A   7   1  15  B   8   0
;

proc bglimm data=MultiCenter nmc=10000 thin=2 seed=976352 plots=all outpost= YPost; 
   class Center Group;
   model SideEffect/N = / dist=normal;
   random int / subject = Center;
run;

/* plot the imputed values for missing responses */
title "Imputation of Missing Value for 11th Obs";
proc sgplot data=YPost;
   histogram SideEffect_11;
run;
title "Imputation of Missing Value for 28th Obs";
proc sgplot data=YPost;
   histogram SideEffect_28;
run;

Try adding the MISSING=CC option and rerunning. The procedure now only uses 28 obs instead of 30 obs. The OUTPOST= data set no longer includes the SideEffect_11 and SideEffect_28 columns.

gp4
Fluorite | Level 6 gp4
Fluorite | Level 6

Now that I know I feel like I should have seen that!  Thank you.  

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 384 views
  • 3 likes
  • 2 in conversation