How to Use Popular PROCS in SAS/STAT® – the Sequel

1 Like

How to Use Popular PROCS in SAS/STAT^® – the Sequel Q&A, Slides, Course Materials (including code) and On-Demand Recording

Watch this Ask the Expert session to learn about the comprehensive set of tools that SAS/STAT offers, more than 100 procedures for statistical analysis, and how it is scalable to meet your needs.

Watch the webinar

Join Mike Patetta as he demonstrates three statistical areas: power and sample size analysis, categorical data analysis, and quantile regression. You will learn:

How to perform power and sample size analysis using PROC POWER.
The categorical data analysis methods that are in SAS/STAT.
How to fit quantile regression models in PROC QUANTREG.
How these methods can help solve your research or business problems.

The questions from the Q&A segment held at the end of the webinar are listed below. The slides from the webinar and course notes, including the code, are attached.

Q&A

How do you easily obtain appropriate standard errors from logit estimator?

The estimated standard error of the parameter estimate is computed as the square root of the corresponding diagonal element of the estimated covariance matrix. Estimates of the variances and covariances might be unstable if the sample size is small. If the sample size is small, I would recommend exact logistic regression or Firth’s penalized maximum likelihood estimation. I would also recommend the profile likelihood method to compute the confidence intervals for the parameter estimates and odds ratios.

What do you suggest if your logistic model doesn’t converge?

It could be a number of things. It could be quasi-complete separation. Check your data to see if it’s Model convergence problems can be caused by quasi-complete separation which occurs when a level of a categorical input perfectly predicts the response variable. Another convergence problem is complete separation which occurs when some linear combination of the predictor variables perfectly predicts the response variable. The first step in dealing with these convergence problems is to figure out which variable or variables are causing the problem. After you figured out what is causing the problem, possible solutions are eliminating the problem variables (especially if they are not important), recode the problem variables, use exact logistic regression, or use Firth’s penalized maximum likelihood estimation.

Socio 1 vs. 3 confidence interval did not cross 1. Is it significant?

Yes, that means the contrast between socioeconomic group 1 versus 3 is significant at the 0.05 significance level.

When should you use PROC SURVEYLOGISTIC vs. PROC LOGISTIC?

The SURVEYLOGISTIC procedure fits linear logistic regression models for discrete response survey data. For statistical inferences, PROC SURVEYLOGISTIC incorporates complex survey sample designs, including designs with stratification, clustering, and unequal weighting. PROC LOGISTIC should not be used when the data is collected with a complex survey design.

Some people say there are no residuals in logistic regression. I feel residuals in logistic regression are just either 0 or 1 minus the predicted probability, which is between 0 and 1. Is it that these residuals don’t have any parametric distributions? Would you agree with this statement?

I believe these people are saying there are no residuals in a generalized linear model when you formulate the model with a link function and a linear predictor. When the formulation of the model includes the expected value of Y, there is no error term. However, PROC LOGISTIC produces a variety of residuals such as Pearson chi-square residuals, deviance residuals, likelihood residuals, standardized Pearson chi-square residuals, and standardized deviance residuals. You are correct that there are no parametric distribution assumptions.

What’s the N at .9? Does the model explain N by quantile?

Quantile regression uses all the data for fitting quantiles conditional on the covariates. It does not segment the unconditional distribution of the response variable, so it does not report the sample size at the 90^th percentile. The parameter estimates are computed by minimizing the sum of check losses. The formula for the check loss function shows the functions gives asymmetric weights to the error depending on the quantile and the overall sign of the error. For each quantile level, the solution to the minimization problem yields a distinct set of regression coefficients. These coefficients show the effect of the predictor variables on the conditional quantiles of the response variable.

How do you explain to the researcher what the results of the quantile regression mean to them?

I would report the parameter estimates for each quantile. I would also show a plot of the fitted conditional quantiles against the continuous predictor variable and a plot of the quantile process which shows how the coefficient of the predictor variable changes as a function of the quantile. This information should illustrate how quantile regression gives a more complete picture of the covariate effect since some coefficients change across quantiles.

In PROC QUANTREG, when looking at multiple quantile estimates (e.g. 0.1 to 0.9 by 0.1), do the confidence intervals still have nominal coverage for addressing false discovery risk stemming from generating many estimates?

You are correct that the multiple comparisons problem also applies to confidence intervals. However, quantile regression uses all the data for fitting quantiles conditional on the covariates. For each quantile level, the solution to the minimization problem yields a distinct set of regression coefficients. Therefore, there should be no multiple comparisons problem with the confidence intervals. PROC QUANTREG does not have a Bonferroni correction for confidence intervals.

Is randomization always an assumption for sample size estimation?

No, sample size computations can be done for non-randomized studies also.

When did these PROCS (POWER AND GLMPOWER) come into use?

PROC POWER and PROC GLMPOWER came out in version 9.1 which was released in December of 2003.

Does PROC POWER incorporate code for different types of random sampling e.g. stratified, cluster, systematic?

No, PROC POWER does not incorporate code for stratified, cluster, or systematic sampling.

What is your conclusion regarding the results of the quantile regression model on the trout example?

For the 10^th and 50^th percentiles, the results showed no relationship between trout density and width-to-depth ratio. However, the relationship is highly significant at the 90th percentile. This illustrates how quantile regression gives a more complete picture of the covariate effect.

What is your conclusion regarding the sample size computations on the aspirin and heart attack study?

The results show that the larger the effect size is (relative risk farther from 1.0 in this analysis), the smaller the required sample size is. For a relative risk of 0.3 (proportion of heart attack deaths among the treatment patients who already suffered a heart attack during a three-year period / proportion of heart attack deaths among the placebo patients who already suffered a heart attack during a three-year period), the required sample size was 1598 to have a power of 0.9. For a relative risk of 0.6, the required sample size was 6236 to have a power of .90. The other parameters were a one-sided test, an alpha of 0.01, a reference proportion of 0.04, and a balanced design.

How do you do a power analysis for a dependent t-test vs an independent t-test?

In PROC POWER, the TWOSAMPLEMEANS statement performs power and sample size analyses for pooled and unpooled t tests, equivalence tests, and confidence interval precision involving two independent samples. The PAIREDMEANS statement performs power and sample size analyses for t tests, equivalence tests, and confidence interval precision involving paired samples.

Is there multiplicity problem with quantile regression since you are doing more than analysis?

Multiplicity refers to the potential inflation of the type I error rate as a result of multiple testing. Quantile regression uses all the data for fitting quantiles conditional on the covariates. For each quantile level, the solution to the minimization problem yields a distinct set of regression coefficients. Therefore, there should be no type I error inflation when requesting multiple quantiles.

Is there a meaningful difference between using CLODDS=PL and the FIRTH option when obtaining odds ratio estimates in smaller sample sizes?

Conventional maximum likelihood estimates might suffer from bias in small samples. The penalized likelihood method introduced by Firth is designed to reduce this bias for a wide range of applications of maximum likelihood. It has also been shown that the Firth method is particularly effective in dealing with cases of quasi-complete separation. When you use the FIRTH option in the MODEL statement of PROC LOGISTIC, you can also use the CLODDS=PL option which requests profile likelihood confidence intervals. I recommend that you always use the profile likelihood confidence intervals when using the Firth method because the Wald confidence intervals can be inaccurate with small sample sizes and when you have quasi-complete separation. Therefore, with small sample sizes there might be a meaningful difference between using CLODDS=PL and CLODDS=WALD even when using the FIRTH option.

Does SAS publish more background information on statistical techniques in general, and how to use SAS to work with them?

We have several resources listed in the resources section of this post. We also have extensive online documentation available. Just visit sas.com and search for the topic of interest. You can also find information on many topics in our SAS Support Communities. Again, just type your topic in the search box. SAS also offers free training for basic statistics.

Can I conduct a power analysis for Cox proportional hazards models in SAS?

Yes, the COXREG statement in PROC POWER performs power and sample size analyses for the score test of a single scalar predictor in Cox proportional hazards regression for survival data, possibly in the presence of one or more covariates that might be correlated with the tested predictor.

Can I fit a Poisson regression model in SAS?

Yes, you can fit a Poisson regression model in PROC GENMOD or PROC HPGENSELECT.

Is the quantile regression procedures scalable to large data sets in SAS?

PROC QUANTREG can be computational intensive with very large data sets. If the data set has several hundred thousand observations, I recommend using PROC QTRSELECT in SAS Viya.

Recommended Resources

Surviving the Cox Proportional Hazards Model with the POWER Procedure

Statistical Power Calculations Using SAS Software

Power Analysis for Generalized Linear Models Using the New CUSTOM Statement in PROC POWER

Categorical Data Analysis Using SAS®, Third Edition

The Categorical Might of PROC FREQ

Five Things You Should Know about Quantile Regression

Quantile Regression in Pharmaceutical Marketing Research

How to Use Popular PROCS in SAS/STAT® Webinar

If you’d like to learn more about the three topics covered, please consider the following SAS Training courses:

Categorical Data Analysis Using Logistic Regression

Determining Power and Sample Size Using SAS/STAT® Software

Robust Regression Techniques in SAS/STAT®

Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.