About StatDave

StatDave

Following up on my earlier comment, I believe that you can use the original data to do a proper analysis that estimates the common sensitivity for each device across the studies taking into account the correlation caused by subjects providing responses on both devices. The analysis, using a GEE model, can provide the common sensitivities along with standard errors allowing for confidence intervals and a comparison. The following DATA step generates some example data. This simple step just generates, for each subject (ID), random binary values on the gold standard and each device. While it doesn't try to build in specific sensitivity values or any correlation, it will serve to illustrate the analysis. It creates data for two devices and three studies, but the proposed analysis could be used for more of both. Note that each subject has two observations created and the variable, DIAG, is created to hold the binary responses from each device (DEV). data f; call streaminit(42633); do study=1 to 3; do rep=1 to 20; drop rep; id+1; gold=rand('bernoulli',.5); dev=1; diag=rand('bernoulli',.5); output; dev=2; diag=rand('bernoulli',.5); output; end; end; run; Before doing the analysis, let's find out the sensitivities in the tables from the generated data. The following produces the 2x2 table for each device against the gold standard within each study. proc sort data=f; by study dev; run; proc freq data=f; by study dev; table diag*gold; run; As shown in this note, the sensitivity in each table is the column percent in the 1,1 cell, so since it is really only necessary to work with the gold=1 column, the following statements again obtain and save the sensitivity values. As such, the sensitivity is just the event probability of the binary, gold=1, variable in each table. The PROC MEANS step then shows the simple averages of the observed sensitivities for each device. proc freq data=f; where gold=1; by study dev; table diag; ods output onewayfreqs=frqs(where=(diag=1)); run; proc means mean; class dev; var percent; run; The following fits a logistic GEE model on the sensitivities. Similar to what is done in the second FREQ step above, only the gold=1 data is used. proc genmod data=f; where gold=1; class study dev; model diag(event='1')=dev / dist=bin; repeated subject=study; lsmeans dev / ilink diff cl plots=none; run; You can see that the estimated common sensitivity for each device across the studies (provided by the ILINK option in the LSMEANS statement) is similar to the average of its observed sensitivities found by PROC FREQ above. The DIFF and CL options provide confidence intervals and a comparison of the device sensitivities. You can see that GENMOD reproduces the observed sensitivities by fitting a saturated model (though that model leaves no variability with which to obtain standard errors): proc genmod data=f; where gold=1; class study dev; model diag(event='1')=study|dev / dist=bin; repeated subject=study; lsmeans study*dev / ilink plots=none; run; I believe that a similar approach can be taken to do the same for other 2x2 table statistics like specificity.

StatDave

Since there is only a single predictor with 5 levels of a binary response, the data can be summarized in a 5x2 table. An overall assessment of whether there are any differences among the 5 event probabilities could be obtained without need for a model by using PROC FREQ. proc freq; table trt*aaa / chisq; run; If you want to take a modeling approach and want to examine residuals, use PROC LOGISTIC since it is specialized for this model and provides various goodness of fit statistics and residuals. However, with only 5 levels of a single predictor, there are only 5 predicted values and therefore only 5 residuals, so examination of residuals is of limited value. This code provides the goodness of fit statistics and plots of all of the diagnostic residuals. It also uses the LSMEANS statement to provide pairwise comparisons among the treatments. proc logistic; class trt/param=glm; model aaa(event='1')=trt / gof iplots; lsmeans trt/plots=none ilink diff; run; For interpretation of the diagnostic plots, see the following: The example titled "Logistic Regression Diagnostics" in the PROC LOGISTIC documentation "Regression Diagnostics" in the Details section of the PROC LOGISTIC documentation This note on goodness of fit in generalized linear models (a class of models of which the logistic model is a part) As noted in the above, these diagnostics are most used by looking for extreme outlying values which makes them more useful when you model contains continuous predictors or at least has many distinct predicted values. Cutoff values on any of the diagnostics are not really possible, but the usage note above gives some idea of how to decide if values are extreme using some of the diagnostics.

StatDave

If you are analyzing survey data, only the SURVEY procedures (SURVEYFREQ, SURVEYLOGISTIC, etc.) can provide a proper analysis of survey sample data. A variable specified in the WEIGHT statement in other procedures may produce correct parameter estimates, but their variances will not be correct. Special variance estimators are needed in the analysis of survey data and only the SURVEY procedures have these estimators. There is currently no SURVEY procedure for Generalized Estimating Equations models.

StatDave

I believe a model could written to do this assuming 1) that you have the data for each of the studies on each of the diagnostics 2) the two diagnostics are continuous variables 3) for each diagnostic in each study, you fit a model to a binary response (such as a logistic model) that produced predicted probabilities and you selected a cutoff value on those probabilities that classifies all observations as predicted events or nonevents yielding the sensitivity for that cutoff If the above are all true, then a question is whether an independent set of subjects was used for each study-diagnostic combination or whether subjects were repeatedly measured either within each study or across all studies.

StatDave

See this note which shows various ways to compute these, and other, statistics and confidence intervals for a 2x2 table. In particular, see the "Other methods to estimate and test the statistics" section. As shown in the first PROC FREQ step for sensitivity, you need to include a WHERE statement to select the level of your response (column) variable that represents the outcome event of interest and then specify your other (row) variable in the TABLES statement with the BINOMIAL option. For specificity, select the nonevent level as shown in the next PROC FREQ step. Use either the EXACT BINOMIAL statement, as shown, or equivalently the CL=EXACT suboption in the BINOMIAL option in the TABLES statement.

StatDave

You can simultaneously fit the model to the training portion of your data and evaluate the fitted model on both the training and test portions using the PARTITION statement in PROC HPLOGISTIC. The following is a simplified version of the example titled "" in the HPLOGISTIC documentation in the SAS/STAT User's Guide (https://support.sas.com/en/software/sas-stat-support.html ). The ROLEVAR option lets you specify the variable in your data set that distinguishes the training and test portions. The output will show you fit statistics for both portions. Note also that instead of using the DESCENDING option, it is safer for you to always use the EVENT= option (either in the LOGISTIC, HPLOGISTIC, or GENMOD procedure) to be sure that you are modeling the level of the response variable that you consider the event level of interest. proc hplogistic data=Sashelp.JunkMail; model Class(event='1')=Make Address All _3d Our Over Remove Internet Order; partition rolevar=Test(train='0' test='1'); run;

StatDave · ‎12-11-2024

See PROC CAUSALMED in the SAS/STAT User's Guide https://support.sas.com/en/software/sas-stat-support.html . In particular, read the Overview, Getting Started, and Examples sections.

StatDave · ‎12-07-2024

The GEE method available in PROC GEE (and GENMOD, but GEE is the recommended procedure) via the REPEATED statement is effectively absorption and clustering. It allows estimation of the effects specified in the MODEL statement adjusted for the repeated (clustered) measurements within subjects and avoids the need to estimate parameters for the individual subjects. Using the example from the Details:Absorption section of the GLM documentation, the following statements produce essentially the same results: proc glm; absorb herd cow; class treatment; model y = treatment/solution; run; proc gee; class herd cow treatment; model y=treatment/type3 wald; repeated subject=cow(herd); run;

StatDave · ‎12-05-2024

This is covered in the description of the DIFF option in the documentation of the LSMEANS statement. By default, the DIFF option produces all possible pairwise comparisons of the LS-means without any adjustment using simple t-tests. You can specify optional values after DIFF= to request differences with the average LS-mean or differences with a control level. If you want mean comparisons to be adjusted, specify the ADJUST= option with one of the available methods.

StatDave · ‎11-27-2024

Because not all tests done are single DF tests on single parameters - for instance, type 3 tests or multi-DF tests you can construct in the CONTRAST or ESTIMATE statements. In general, for testing hypotheses on linear combinations of parameters with 1 or more DF, the Wald test has a limiting chi-square distribution. So, it is used throughout for consistency.

StatDave · ‎11-25-2024

I believe the problem here is because you want to make comparisons with your reference level, 1, and the REF='1' option makes it the LAST level in the parameter estimates table, but your ESTIMATE statements contrast each level against the FIRST level (referring to your original post). Try moving all the "1" values from the first position to the last position. For example, the first ESTIMATE statement for race_ethnicity would become: estimate "PR for NHB vs NHW" race_ethnicity -1 0 0 0 1/exp; This should make all of the values in the L'Beta column just the negatives of the corresponding parameter estimates since you are then estimating reference-level_i differences rather than the other way around, which is what the parameters estimate. The values for an effect in the ESTIMATE statement are applied to the parameter estimates in the order in which they appear in the parameter estimates table.

StatDave · ‎11-25-2024

Moved: https://communities.sas.com/t5/Statistical-Procedures/PROC-GENMOD-conflicting-estimates/m-p/951833#U951833

StatDave · ‎11-22-2024

No worries, but actually GLIMMIX also runs fine with GLOGIT: proc glimmix data=dv2; model dv=income/dist=mult link=glogit s; run;

StatDave · ‎11-22-2024

The REF= option does not apply to an ordinal response. There should be a note in the log to this effect along with a note of what levels are being modeled. For an ordinal response, your choices are just to model the lower values (default) or the higher values (by specifying MODEL guide_levels(descending)= ...). Your results show the default. So, now you can see that, for Education=1, Pr(Guide level=1)=0.27, Pr(Guide level=1 or 2)=0.606, and Pr(Guide level=1 or 2 or 3)=0.786. If you want Pr(Guide level=2) it is the difference 0.606-0.27.

StatDave · ‎11-22-2024

The values shown in the Estimate column are estimated logit (log odds) values for the associated Education level. Logits can have any real number. Since your response variable (Guide levels) shows 3 values in the table, I assume it has 4 possible levels resulting in 3 cumulative logits being modeled. The cumulative logits in an ordinal multinomial model like yours with response levels 1, 2, and 3 are, by default, log(p1/(p2+p3)) and log((p1+p2)/p3). That is, they divide the ordered response levels in the two possible places. Similarly for your 4 level response with 3 possibly divisions. The first column in the table identifies the logit by one of the response levels appearing in the numerator. But I am concerned that your response levels do not sound like they are in logically ascending or descending order as they must be in order for the results to have any meaning. You should examine the Response Profile table to verify and then rename your levels as needed to assure that they are logically ordered. Since you used the ILINK option, your table should also have a Mean column which gives estimated cumulative probabilities (p1, p1+p2, p1+p2+p3) for each Education level which are usually more meaningful than the estimated logits in the Estimate column.

Online Status	Offline
Date Last Visited	yesterday

Re: A meta analysis for sensitivity comparing two diagnostic tools

Re: Checking residuals for binomial data that is being analyzed by PRO...

Re: Can proc genmod properly handle 'sampling' weights?

Re: A meta analysis for sensitivity comparing two diagnostic tools

Re: Pearson-Clopper confidence limits for sensitivity and specific usi...

Re: logististic regression- work on train data or all data??

Re: PROC CALIS Mediation Binary Outcome

Re: Model with Clustering, Absorption, Year/individual fixed effects, ...

Re: Question About Default Means Separation in PROC GLIMMIX

Re: Wald test Logistic regression

Re: Model for Correlated data

Re: Checking residuals for binomial data that is being analyzed by PRO...

Re: Can proc genmod properly handle 'sampling' weights?

Re: A meta analysis for sensitivity comparing two diagnostic tools

Re: Pearson-Clopper confidence limits for sensitivity and specific usi...

Re: PROC CALIS Mediation Binary Outcome

Re: A meta analysis for sensitivity comparing two diagnostic tools

Re: Checking residuals for binomial data that is being analyzed by PRO...

Re: Can proc genmod properly handle 'sampling' weights?

Re: A meta analysis for sensitivity comparing two diagnostic tools

Re: Pearson-Clopper confidence limits for sensitivity and specific usi...

Re: logististic regression- work on train data or all data??

Re: PROC CALIS Mediation Binary Outcome

Re: Model with Clustering, Absorption, Year/individual fixed effects, ...

Re: Question About Default Means Separation in PROC GLIMMIX

Re: Wald test Logistic regression

Re: PROC GENMOD conflicting estimates

PROC GENMOD conflicting estimates

Re: estimates in proc logistic when predictor is a continuous var

Re: Help Understanding Proc Genmod Least Squares Means Output

Re: Help Understanding Proc Genmod Least Squares Means Output