Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: there are differences in the P values of SAS and SPSS when it is ...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-06-2023 11:27 PM
(1254 views)

Dear all,

We conducted a multivariate logistic regression analysis on the same dataset using both SAS and SPSS software. We found that the odds ratio of all variables was consistent between the two software.

However, when the variable is a multiple categorical variable, there are differences in the P values of SAS and SPSS analyses. Notably, the P values of continuous variables and binary variables are consistent between SAS and SPSS software.

We are very confused about this situation. Your help would be greatly appreciated.

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The p-values are for different null hypotheses.

In the Parameter Estimates table, the p-value is for the null hypothesis beta=0. For example, in the image you posted, the estimate for the coefficient of (xgra=2) in the model is -0.2, which is not significantly different from 0 because the standard error is approximately 0.2.

In the Odds Ratio table, the null hypothesis is ratio=1. For example, in the image you posted, the estimate for the ratio of (xgra 2 vs 1) is 2.7 and a 95% CI is [1.8, 3.9]. Because this interval does not include 1, we infer that the related ratio parameter is significantly different from 1.

11 REPLIES 11

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Please show us your SAS code and explain which p-value you are looking at.

In general, a p-value is dependent on the distribution of a statistic under a null hypothesis. Sometimes p-values are approximate because the true sampling distribution of a statistic is unknown or is known only asymptotically for large samples.

Different software will obtain the same value only if the null hypothesis and the distributional assumptions are the same for both software. Clearly, that is not the case for this problem. But if you show the code, we can explain what H0, statistic, and distributional assumptions SAS is using.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you so much!

Thank you very much for the attention and guidance of every expert.

For example, ** xgra** is a three-category variable, divided into categories 1, 2, and 3. And

They are the two variables in our logistic analysis.

* Y* is a dependent variable divided into two categories, 0 and 1.

*This our SAS code for the example data :*

** proc logistic descending data=s1;*

*class xgra(ref="1") xindl(ref="0"); *

*model y=xgra xindl;*

*run; **

And we upload figure1-2 for the result of SAS (figure1) and SPSS(figure2), respectively.

The content in the red box represents the p-value results of the multi-class variable** xgra **.

Although the results of the odds ratio are the same for both software, we can observe that the p-value results from the two software are not the same, even opposite (one is significant while the other is not significant).

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for uploading the images and posting the PROC LOGISTIC statements. Your output shows that the parameter estimates from SAS and SPSS are different. Therefore, you should not be wondering why the p-values are different but why the *coefficient estimates* are different.

As StatDave says, the most likely difference is the coding (parameterization) for the categorical independent variables. PROC LOGISTIC uses effect coding by default. From the SPSS output, it looks like they are using "dummy encoding," which SAS calls GLM encoding.

Try modifying your CLASS statement to be

*class xgra(ref="1") xindl(ref="0") / param=GLM**;*

and let us know whether that provides parameter estimates that match your SPSS output.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks!

We attempted to modify our CLASS statement as you and StatDave had mentioned.

We have successfully resolved the issue. It was done perfectly.

Notably, we still have a question.

Why does the SAS output show that the OR value of variable* xgra (xgra 2 vs1 OR：2.684（1.845-3.904）* is significant, but the p-value

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The p-values are for different null hypotheses.

In the Parameter Estimates table, the p-value is for the null hypothesis beta=0. For example, in the image you posted, the estimate for the coefficient of (xgra=2) in the model is -0.2, which is not significantly different from 0 because the standard error is approximately 0.2.

In the Odds Ratio table, the null hypothesis is ratio=1. For example, in the image you posted, the estimate for the ratio of (xgra 2 vs 1) is 2.7 and a 95% CI is [1.8, 3.9]. Because this interval does not include 1, we infer that the related ratio parameter is significantly different from 1.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your suggestion, which enables us to have a deeper understanding of this principle.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The question I would ask is how often an observed difference has a practical impact on analysis.

Even running the same version of the same release of software on different computers can result in different results due to differences in versions of math co-processors (when they were different chips) or just the main processor.

Between different software the algorithms chosen to implement a specific calculation can result in different results because of limits of precision in internal storage. Plus you have the whole "decimal values often cannot be stored exactly with binary" issue. So you get some amount of rounding differences that can accumulate.

I had a reason to compare SUDAAN, another software used for statistics in complex weighting of data, with SAS. I could detect differences in the confidence limits between SUDAAN and SAS output but the differences usually were detectable at the 0.001 position in percentages. In the data that I was using that meant at most the limits when projected onto the population of interest might vary by almost 0.3 persons (yes threee-tenths of a person) which we deemed as not a practical impact on the decisions that would be made using the results.

Or consider something like house pricing that typically runs with values recently well over $100,000. Would a difference in analysis of pricing that varied by $0.57 (57 cents) make much difference in a practical sense on the analysis?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Dennisky, how different are your p-values for SAS and SPSS? Could you please give an example?

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.