Re: Different ADF test results between SAS and Python

asiahty · Posted 03-18-2026 02:32 AM

Hi,

I'm a new SAS user. I have been trying to replicate the tests I conducted in Python (e.g. normality, Dicky Fuller Test) in SAS. However, I have noticed that there are differences in results (Pr < Tau) and ADF stat between SAS and Python. For clarity, I used PROC ARIMA for ADF in SAS and adfuller package in Python. May I know why there are differences in result?

Appreciate if anyone can respond to this.

Thank you.

ballardw · Posted 03-18-2026 03:59 AM

Generic question gets a generic response: differences arise from different details in the programming and often the machines the code executes on. Things like number of decimal places maintained internally for computations can easily effect results even when using the same algorithms and just because a statistical test has the same name in different packages there is very likely no commonality at all points in the way the algorithm was programmed.

HOW much difference might be the question that you need an answer to. So perhaps sharing the results in question is place to start.

Also some programs will report the results using different defaults. For instance in a logistic regression with a true/false type outcome one program may default to modeling the "true" value and the other the "false". So same data would tend to report something that looks like a compliment of the other (70% true or 30% false for example).

For serious details you might need to include:

Your data

Your Code for both approaces

The output

The research you question you want answered so we can validate that the SAS (at least) approach is using appropriate options.

Some of the SAS procedures will have a section in the online help called details that may include some of the computation details.

asiahty · Posted 03-18-2026 04:26 AM

Thank you for your response.

Further details on this:

1. Data (as per Excel attachment)

2. You may find the code as below:

a. SAS code:

/*stationarity test*/
proc arima data=WORK.QUARTERLY;
identify var= &&var&i stationarity=(adf=(0));
ods output stationaritytests=stationary_data;
run;

b. Python code:
adf_result_var1 = adfuller(raw_ln_hfa[combo[0]], maxlag=0, regression='c', autolag=None)
adf_result_var2 = adfuller(raw_ln_hfa[combo[1]], maxlag=0, regression='c', autolag=None)
adf_var1 = adf_result_var1[0] # ADF statistic
adf_var2 = adf_result_var2[0]
adf_pval_var1 = adf_result_var1[1] # p-value
adf_pval_var2 = adf_result_var2[1]

3. Output:

a. SAS p-value (from pr < tau): 0.51376756197933

b. Python p-value: 0.521899648105788

4. Research question: to reject or accept whether data is stationary.

sbxkoenk · Posted 03-18-2026 04:15 AM

You used adfuller package in Python, right?

Link to the Python documentation:

https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html

What's your setting for the regression parameter?
Note that SAS does not support the ctt setting (“ctt” : constant, and linear and quadratic trend).
What's your setting for maxlag parameter?
Note: the number of augmenting lags in the underlying regression model is probably different between SAS PROC ARIMA and Python.

I think it's best you put your Python code AND your SAS code in the reply.

For your Python code, you can use the "Insert Code" icon ( </> ) in the header with icons.
For your SAS code, you can use the "Insert SAS Code" icon ( the little running man to the right of </> ) in the header with icons.

BR, Koen

asiahty · Posted 03-18-2026 04:30 AM

adf_result_var1 = adfuller(raw_ln_hfa[combo[0]], maxlag=0, regression='c', autolag=None)
adf_result_var2 = adfuller(raw_ln_hfa[combo[1]], maxlag=0, regression='c', autolag=None)
adf_var1 = adf_result_var1[0] # ADF statistic
adf_var2 = adf_result_var2[0]
adf_pval_var1 = adf_result_var1[1] # p-value
adf_pval_var2 = adf_result_var2[1]

/*stationarity test*/
proc arima data=WORK.QUARTERLY;
identify var= &&var&i stationarity=(adf=(0));
ods output stationaritytests=stationary_data;
run;

Hi,

These are my codes for your reference.

sbxkoenk · Posted 03-18-2026 07:06 AM

An Augmented Dickey-Fuller (ADF) test with zero augmenting lags is equivalent to the original Dickey-Fuller (DF) test.

The ADF test performed in PROC ARIMA is based on the description of this test in Hamilton (1994).

Hamilton, J. D. (1994). Time Series Analysis. Princeton, NJ: Princeton University Press.

Note there are different ways in SAS to perform ADF:

PROC ARIMA
PROC AUTOREG
SAS Help Center: PROBDF Function for Dickey-Fuller Tests
SAS Help Center: DFTEST Macro

I guess they all provide the same p-values.

No idea why you notice a difference between SAS and Python in your p-values for DF.

a. SAS p-value (from pr < tau): 0.51376756197933

b. Python p-value: 0.521899648105788

Both p-values are very close to each other and there is no doubt about the conclusion (the same conclusion for both), but they may still be far enough apart to be odd.
It could be due to all sorts of things. Please note that the p-values are derived from a huge amount of simulation replications.

Maybe @SASCom1 can help further?

If you want to open a Technical Support ticket,
here is a link for your convenience: https://support.sas.com/en/technical-support.html#contact

(SAS Technical Support)

BR, Koen

SASCom1 · Posted 03-27-2026 07:21 PM

Sorry for my late response.

Here is the documentation on the computation of ADF test p values in SAS:

SAS Help Center: PROBDF Function for Dickey-Fuller Tests

**********************************************************************************************************

The PROBDF function is calculated from approximating functions fit to empirical quantiles that are produced by a Monte Carlo simulation that employs 10^8 replications for each simulation. Separate simulations were performed for selected values of n and for d = 1, 2, 4, 6, 12

(where n and d are the second and third arguments to the PROBDF function).

The maximum error of the PROBDF function is approximately +-10^-3 for d in the set (1,2,4,6,12) and can be slightly larger for other d values. Because the number of simulation replications used to produce the PROBDF function is much greater than the 60,000 replications used by Dickey and colleagues (Dickey and Fuller 1979; Dickey, Hasza, and Fuller 1984), the PROBDF function can be expected to produce results that are substantially more accurate than the critical values reported in those papers.

**************************************************************************************************************

Different software packages may implement different algorithms, and even with similar algorithms, there could still be minor differences in the implementation details, and computed p values may not be identical, as the case you observe.

I hope this helps.

asiahty · Posted 03-29-2026 08:06 PM

I see. If we want to verify SAS calculation independently through other
means e.g. R/Python/Excel, how can we do that? Is it through Tau statistic?

sbxkoenk · Posted 03-29-2026 11:53 PM

There are three kinds of tests under the ADF tests: rho test, tau test, and F test.

The rho test is the regression coefficient test, which is also called the normalized bias test.
The tau test is the studentized test.
The F test is a joint test for unit root.

For more information about test statistics under the ADF tests, see the section

SAS Help Center: Stationarity Tests

And here's a paper by David A. Dickey himself.

>> SAS Global Forum 2016 proceedings
>> Paper 7080-2016

>> What’s the Difference?

>> David A. Dickey, NC State University

>> https://support.sas.com/resources/papers/proceedings16/7080-2016.pdf

Prof. Dickey claims on p.14: "The taus and their associated pvalues are the most commonly used of these tests."

BR, Koen

asiahty · Posted 03-30-2026 03:22 AM

Thank you for your response. Since we cannot compare p-value directly between the two platforms, how do we at least determine the consistency of pass/fail stationary test between the two platforms? Is there a way to extract critical value from SAS so I can determine whether tau statistic passes or fails the 1%, 5% and 10% of critical value thresholds and can I extract the coefficients used to approximate p value from SAS?

SASCom1 · Posted 03-30-2026 06:45 PM

SAS only outputs the rho, tau, and F test statistics together with their corresponding p values, it does not print the 1%,5%,or 10% critical values. You make conclusions to reject or not reject the null using the computed p values and compare with your desired significance level.

asiahty · Posted 03-30-2026 08:06 PM

Noted on this response, but my issue arises when the p-value results lead
to different conclusion (e.g. Python rejects null but SAS accepts null). How can I reconcile these differences?

sbxkoenk · Posted 03-31-2026 03:39 AM

I hope it happens extremely rarely, but of course – at some point – such situations (different conclusions) will arise.

You could say that you’re in a sort of meta-analysis scenario (a bit of a stretch, I agree) and you can opt for a weighted p-value. With equal weights, you get an average p-value.

It’s a tricky topic, ... but you’ll have to choose a reconciliation scenario if you continue to work with both SAS and Python to investigate stationarity.

Good luck,

Koen

SASCom1 · Posted 03-31-2026 08:41 PM

What p values did you get from the two software packages for this case, when they lead to different conclusions? Are they using the same codes as you provided earlier in this thread? Can you provide more details on the output and the example data?

asiahty · Posted 04-01-2026 12:41 AM

Yes, I used the same codes as shown earlier in the thread. I have attached the data and added the output for your reference.

Result

Tau (Python): -2.88491230787933

Tau (SAS): -2.88

The p-value at 0.05 significance level:

Python: 0.0471308321494495

SAS: 0.0551742457504444

Catch up on SAS Innovate 2026