About StatDave

StatDave · ‎08-01-2023

Have you tried fitting your joint model using PROC GLIMMIX? You might find that much easier. See the example titled "Joint Modeling of Binary and Count Data" in the GLIMMIX documentation.

StatDave · ‎08-01-2023

The INCLUDE= option goes in the MODEL statement, not the SELECTION statement. See the HPGENSELECT documentation.

StatDave · ‎07-31-2023

It doesn't really matter that the treatment isn't applied at the same time for each subject. The model just needs an indicator of pre- vs. post-treatment, whenever it actually occurs. See the repeated measures sections of this note.

StatDave · ‎07-29-2023

See the "Details: Classification Table" section of the LOGISTIC documentation. As described toward the end of that section, the CTABLE results are based on cross validated ("leave one out") predicted probabilities, not on the probabilities obtained by directly applying the fitted model to the individual observations. Cross validation is used to reduce the optimistic, over-estimation of the fit of the model resulting from using the same data to evaluate the model as to fit it. The relevant comparison of using the ordinary predicted probabilities vs the cross validated predicted probabilities is done using the following code. In the OUT= data set, _FROM_ is the observed response; _INTO_ is the predicted response using the ordinary predicted probabilities; CV_predY is the predicted response using the cross validated predicted probabilities. Note that using the ordinary values, the correct probability is (5+3)/12=.67 and using the cross validated value is (2+3)/12=.417. PROC LOGISTIC DATA = TEST1 desc; MODEL Y = X1-X3 / CTABLE PPROB=0.5; output out=out predprob=(x i); run; data out; set out; CV_predY=(xp_1>=.5); run; PROC FREQ DATA = out; TABLES _from_*_into_; *ordinary predicted probabilities; TABLES _from_*CV_predY; *cross validated predicted probabilities; run;

StatDave · ‎07-29-2023

I suspect that the macro simply computes predictive margins for TREATMENT and its marginal effect (difference in margins) which is supported by the following Margins macro call producing the same estimates and confidence limits (matches the Wald limits). See the Margins macro documentation. %margins(data=flies, class=treatment, classgref=first, response=death40, roptions=event='1', model=treatment thorax, dist=binomial, margins=treatment, diff=all, options=cl reverse) Unfortunately, the Margins macro does not handle the multinomial model.

StatDave · ‎07-28-2023

Apply a format to your RACE levels so that they are in the desired order when sorted.

StatDave · ‎07-26-2023

Convergence problems are quite common with models fit by iterative optimization methods such as maximum likelihood or GEE (as in this case) and such problems can happen in many possible ways which depend on the data and model. Convergence of models fit by iterative methods can never be guaranteed. The cause in any particular case cannot typically be determined from examining the model specification or data. A solution often must be found by experimentation. The most helpful strategy is usually to simplify the model (that is, reduce the number of model parameters) in some acceptable way such as by removing higher-order effects such as interactions, removing predictors, or by dropping or merging categories of CLASS variables. In general, the more parameters there are in the model, the more likely convergence problems become. In categorical response models like this, sparseness of the data is common and can cause various fitting errors, though this is not the only possible cause. Generally, as model complexity increases and sample size decreases, the problem becomes more sparse and more likely to result in convergence problems. So again, model simplification is generally required. Starting with a simple model and adding variables as they can be supported is often a good strategy. Note that GEE models can also be fit in the newer PROC GEE using similar syntax. You could try it since any variation in the fitting algorithm of an iterative method like GEE can potentially affect convergence. PROC GEE is a newer procedure specifically for fitting the GEE model and is the recommended procedure when fitting that model. It adds support for nominal multinomial response data and data that are missing at random and might perform better in some respects.

StatDave · ‎07-22-2023

If your final goal is to find an optimal cutoff, then note that there are statistics (like Youden's index and others) that are often used for that. These can be obtained using the ROCPLOT macro (or in PROC LOGISTIC if you have a recent version of SAS Viya). However, note that the unique predicted probabilities, which are the cutoffs used for the ROC curve, are computed using ALL of the predictor values. So, it is not possible to talk about cutoffs on just your LAB predictor with your model. Each cutoff is determined by both LAB and VISIT using your model. If you remove VISIT from the MODEL statement then you can add the OUTROC= option in the MODEL statement in your PROC LOGISTIC step and then merge that data set together with your FITDAT data set. proc sort data=fitdat out=fitdat2(rename=(predprob=_PROB_)); by predprob; run; proc sort data=or out=or2; by _prob_; run; data or3; merge fitdat2 or2; by _prob_; run; This allows you to have a data set (OR3) that shows the LAB value corresponding to each cutoff. That data set also has the cell counts of the 2x2 table associated with each cutpoint and the sensitivity and 1-specificity statistics. Using those, you can easily compute the other statistics you want as shown in this note on computing various 2x2 table statistics.

StatDave · ‎07-11-2023

Add an LSMEANS statement with the ILINK option. The predicted probabilities are in the Mean column. Add the CL option if you want confidence limits lsmeans y / ilink cl;

StatDave · ‎07-11-2023

A quick and easy way is to just save the CROSSLIST table and then plot it as a heat map in PROC SGPLOT. For example proc freq data=mydata; table row*col / chisq crosslist(stdres); ods output crosslist=clist; run; proc sgplot data=clist; heatmapparm y=col x=row colorresponse=stdresidual; run;

StatDave · ‎07-07-2023

See the link to "this note" in the accepted answer. The note it links to shows the code using including the LSMEANS statement.

StatDave · ‎07-05-2023

The section of this note discussing non-identity link models might prove helpful.

StatDave · ‎07-03-2023

Use the STORE statement in PROC GLIMMIX to save the fitted model. Then use PROC PLM to read the saved model (using the RESTORE= option) and include the EFFECTPLOT statement.

StatDave · ‎06-28-2023

You didn't tell us what provided that list of apparent variable names, so I have to guess from them that you obtained this from PROC FREQ. If so, then none of them is what you want if you want a KS test that is often used to assess a logistic model. To get that test, you should instead use PROC NPAR1WAY on the predicted probabilities obtained from PROC LOGISTIC as discussed and illustrated in this note. As described there, the KS test has issues that might make less than the best assessment tool.

StatDave · ‎06-26-2023

The purpose of the SUBJECT= option in the REPEATED statement of PROC GENMOD is simply to distinguish those observations that are correlated from those that aren't. That is, it defines the clusters of correlated observations. Observations with the same value of the SUBJECT= effect belong to the same cluster and are assumed to be correlated. So, if you feel that there is correlation among all of the observations in the same state, then you should specify SUBJECT=STATE.

Online Status	Offline
Date Last Visited	a month ago

Re: Association between nominal variable and ordinal variable

Re: Association between nominal variable and ordinal variable

Re: Association between nominal variable and ordinal variable

Re: Association between nominal variable and ordinal variable

Re: Odds ratio and relative risk from procedure causaltrt

Re: Confidence Interval for Percent Increase

Re: Confidence Interval for Percent Increase

Re: Confidence Interval for Percent Increase

Re: Undersampling in PROC PROC HPSPLIT / Adjusting for prior probablit...

Re: Appropriate Analysis for Score (Ordinal) Data - with example data

Re: Model for Correlated data

Re: Association between nominal variable and ordinal variable

Re: Association between nominal variable and ordinal variable

Re: Association between nominal variable and ordinal variable

Re: Association between nominal variable and ordinal variable

Re: Odds ratio and relative risk from procedure causaltrt

Re: ERROR: No valid parameter points were found in proc nlmixed, joint...

Re: lasso logistic regression control variables

Re: Staggered difference-in-differences

Re: PROC LOGISTIC: Error Rate from FITSTAT disagrees with Incorrect Pe...

Re: Marginal standardization of predicted probabilities for multinomia...

Re: Display Odds Ratio Plot in Descending Order

Re: PROC GENMOD Error Interpretation

Re: ROC analysis for repeated measures

Re: GLIMMIX Logistic Regression absolute % predicted value

Re: Two questions about ChiSquare: get STDRES into the column and colo...

Re: How to get estimates for categorical variables in a Modified Poiss...

Re: How do I preform a interrupted time series analysis using PROC GEN...

Re: Visualizing Continuous*Continuous variable Interaction of Binary O...

Re: Identification of Kolmogorov-Smirnov metric within the metrics of ...

Re: PROC GENMOD/PROC GEE for repeated County-level data