SAS Support Communities

Season

@SAS_Rob hits the nail on the head. The regression parameter estimates displayed in the MI procedure refer to those of the model built after standardization. To see this, run the following code: /*Standardize all variables in the dataset*/ proc stdize data=fish1 out=fish2; var Length1 Length2 Length3; run; /*Build multiple regression model after standardization*/ proc genmod data=fish2; model Length2 = Length1 / dist=normal link=identity; run; And you will see that the results output in the GENMOD procedure only differs from those in the MI procedure to an extent explainable by rounding error.

Season

I guess you are using interchangeably the concept of components- a term in finite mixture models; and subgroup- a word coined by yourself and may be more suitable for your analysis. PROBMODEL Statement :: SAS/STAT(R) 14.1 User's Guide says that the PROBMODEL statement is used for building regression models (usually logistic regression) for component membership. Odds ratios and other common metrics reported in logistic regression can be calculated from the table named "Parameter Estimates for Mixing Probabilities" in the output.

Season

Part of the theoretical justification of the adjustment of confidence intervals in the presence of adjustment of estimated variances or standard errors in complex survey data analysis is the exceedingly low degrees of freedom used for building confidence intervals compared to the one used for construction of such interval under a simple random sampling setting. In fact, the degrees of freedom used for such interval estimation is usually a "rule of thumb degrees of freedom" and equals to the difference between the unique number of primary sampling units and the number of strata. In most sampling cases, this number is usually dramatically lower than the sample size minus one, which is the degrees of freedom used for building confidence intervals in a simple random sampling setting. Why do we employ such a small degrees of freedom for complex surveys? I think part of the reason lies in the shrunken effective sample size caused by the correlation among survey subjects. To understand the concept of effective sample size, I would like to modify an example provided in Amazon.com: Applied Survey Data Analysis (Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences): 9780367736118: Heeringa, Steven G., West, Brady T., Berglund, Patricia A.: Books. Suppose you are a researcher interested in the distribution of age and sex of the English teachers of a school. You entire a classroom with 50 students and ask them questions on their teachers' age and sex. Given that the students actually share an English teacher and their results are correlated, you actually receive the same answer among them. Therefore, if you ask your question one by one, you do not have information on 50 teachers, you only have information on one. In other words, the effective sample size is 1 instead of 50. From a statistical perspective, the shrinkage in effective sample size is attributable to the correlation among the observations. In a simple random sampling setting, the assumption that subjects are independent of each other is tenable, so the information you get from one subject is completely different from that of the other subject. You really get something you new every time you obtain one sample subject. But once the observations are correlated, as in complex surveys, you actually receive less information from a new respondent than you do when they are independent of each other, an assumption you can safely make in a simple random sampling setting.

SAS_Rob

In general, the difference will be in the manner in which the variances are computed. Complex survey data uses methodology that adjusts for the design (cluster and strata) effect. Data that is merely weighted will use the usual formulations for weighted variances.

Ksharp · ‎03-25-2025

It looks like the order of record has changed between these two datasets. Here is an example. data ds; call streaminit(123); do n=1 to 100; baseline=rand('normal'); month1=rand('normal'); month2=rand('normal'); if n in (10:20 90 98:100) then call missing(baseline); if n in (1:4 9 28:30) then call missing(month1); if n in (40:50 80:85) then call missing(month2); output; end; run; proc sort data=ds out=ds2; by descending n; run; proc mi data=ds seed=123 nimpute=100 out=out1; var baseline month1 month2; run; proc mi data=ds2 seed=123 nimpute=100 out=out2; var baseline month1 month2; run; proc sort data=out2 out=out22; by _Imputation_ n; run;

fanuser · ‎03-12-2025

Thank you very much for providing such a wonderful solution!

Season · ‎03-11-2025

Actually, the STRATA statement in the SURVEY procedures are intended to house the strata used in the sampling design, not the croostabs. For your problem, it is appropriate to use syntax like gender*outcome or age group*gender*outcome like you might frequently do in the FREQ procedure.

SAS_Rob · ‎02-21-2025

The long format is not usually compatible with performing multiple imputation, thus data restructuring from long to wide or the reverse is often needed for multiple imputation. There is a good discussion in Raghunathan's Missing Data in Practice text (2016) pages 121-126 and in the 2018 paper linked below. Using SAS for Multiple Imputation and Analysis of Longitudinal Data

SAS_Rob · ‎02-19-2025

No, you are interpreting it correctly. I was wrong about what was contained in that data set. In any regard, hypothesis testing in finite mixture models is not very well defined because it is difficult if not impossible to derive the asymptotic distribution for the mixture likelihood. There is a paper that discusses the problem and makes a few suggestions related to goodness of fit tests that might work. Anything that they propose would not be available in SAS, but you might be able to program it yourself. Hypothesis testing for finite mixture models - ScienceDirect

wwm · ‎02-15-2025

Thank you. I now know how to set edf.

Digicha1 · ‎02-11-2025

Hi Rob, Thank you for sharing! Would you be able to share an example of the dataset I need to create? I require the individual treatment means for each visit - would it be using the array terms?

Ksharp · ‎01-29-2025

You want this ? data c; input Visit Group Subgroup$ response COUNT; y=1; wt=response; output; y=0; wt=count-response; output; datalines; 1 1 A 1 10 1 1 B 2 8 1 3 A 0 7 1 3 B 2 5 2 1 A 3 7 2 1 B 1 5 2 3 A 2 7 2 3 B 1 5 3 1 A 0 6 3 1 B 1 4 3 3 A 2 6 3 3 B 0 4 ; proc freq data=c noprint; by visit group subgroup; weight wt / zero; tables y / binomial(level='1' cl=exact) ; exact binomial; output out=want binomial; run; proc print ;run;

SAS_Rob · ‎01-28-2025

The theory behind multiply imputed data only takes estimates and their standard errors and gives combined estimates. It does not lend itself to combining confidence intervals or p-values (adjusted or unadjusted). My suggestion would be to combine the estimates using MIANALYZE like shown and then save the p-values to a SAS data set. You could then use on the p-value adjustment methods available in Proc MULTTEST. Note that not all the p-value adjustment methods are available in MULTTEST that are in GLIMMIX. SAS Help Center: PROC MULTTEST Statement data outmi; do _imputation_=1 to 3; do trt='test','trt1','trt2','trt3'; do rep=1 to 10; trtn+1; y=1.3+.18*trtn+rannor(123); output; end; end; end; run; proc glimmix data=outmi; by _imputation_; class trt; model y=trt; lsmeans trt/diff; ods output diffs=diff; run; data diff2; set diff; comparison=trt||' vs '||left(_trt); run; proc sort data=diff2; by comparison _imputation_; run; proc mianalyze data=diff2; by comparison; modeleffects estimate; stderr stderr; ods output ParameterEstimates=estimate_ds(rename=(probt=raw_p comparison=test)); run; proc multtest inpvalues=estimate_ds holm hoc fdr bon; run;

YYK273 · ‎01-24-2025

Thank you so much! Very helpful!

SAS_Rob · ‎01-14-2025

Given the information you shared, it *might* be appropriate to use the CLUSTER statement. Not knowing the survey design makes it difficult to know for sure.

Online Status	Offline
Date Last Visited	4 hours ago

SAS Support Communities

Re: Why ESTIMATION with solely observed data in PROC MI using monotone...

Re: PROC FMM PROBMODEL Output Interpretation

Re: PROC Surveymeans yields different 95% CI than standard Z-score cal...

Re: Survey Data Analysis

Re: Troubleshooting PROC MI differences when trying to replicate

Re: Troubleshooting PROC MI differences when trying to replicate

Re: trend p-value using the Cochran-Armitage test

Re: Proc MI

Re: Joint test in PROC FMM?

Re: Joint test in PROC FMM?

Re: how to use the macro after the imputation?

Re: Proc Surveylogistic and Contrast line

Re: FIsher's Exact test zero row/column

Re: Pooled descriptive statistics after multiple imputation

Re: Adjusted Predictive Probabilities from Glimmix with random effects...

Re: Why ESTIMATION with solely observed data in PROC MI using monotone...

Re: Survey Data Analysis

Re: PROC Surveymeans yields different 95% CI than standard Z-score cal...

Re: trend p-value using the Cochran-Armitage test

Re: Joint test in PROC FMM?

Re: Why ESTIMATION with solely observed data in PROC MI using monotone...

Re: PROC FMM PROBMODEL Output Interpretation

Re: PROC Surveymeans yields different 95% CI than standard Z-score cal...

Re: Survey Data Analysis

Re: Troubleshooting PROC MI differences when trying to replicate

Re: trend p-value using the Cochran-Armitage test

Re: Strata statement in surveyfreq for case-cohort data

Re: Proc MI

Re: Joint test in PROC FMM?

Re: How can I get coefficient 3 categories (not ordered) dependent var

Re: PROC MCMC, Can I Obtain LSMean Statements Without Running PROC MIX...

Re: Exact Bionomial CI with Summary Data

Re: Combining LSMEANS Output

Re: Proc Surveylogistic Link=Glogit: Are the "Odds Ratio Estimates" ac...

Re: PROC SURVEYLOGISTIC with clustered/correlated data

Follow Us

What is...