BookmarkSubscribeRSS Feed
Multipla99
Quartz | Level 8

Hi, 

 

Could someone please explain the formula "(ystat-xstat)/x*100" present in the PROC COMPARE documentation, under Results?

 

 Especially, I would like to know what the single "x" in the denominator and the "xstat" and "ystat" in the numreator represents.

 

Cheers,

Multippla99 

6 REPLIES 6
Quentin
Super User

Interesting.  I had assumed that formula was explaining that %Diff is calculated from the difference between the statistics from the BASE dataset and the COMPARE dataset.  But running a little example, I can't make much sense of the results.  For example, why does it indicate a difference of 1 for the MAX statistic, when the MAX is the same for both datasets?

 

data class ;
  set sashelp.class ;
  if name='Alfred' then height=70 ;
run ;

proc compare base=sashelp.class compare=class allstats ;
run ;

Quentin_0-1762352481845.png

 

Kathryn_SAS
SAS Employee

The following paper may help beginning with Example 6 on page 13:

https://support.sas.com/resources/papers/proceedings10/149-2010.pdf 

It says:

Under the Diff and %Diff columns these statistics refer to the paired differences

Multipla99
Quartz | Level 8
Thank you, Kathryn! I will read the paper and see if I get an explanation there.
Multipla99
Quartz | Level 8

Thank you Kathryn!

 

I have now read the recomended parts of the paper and it is obvious that the author knows what this is about. However, I still miss the complete exact definition of how DIFF and  %DIFF are calculated for the the different statistics, .

 

Best regards,

Multipla99

Quentin
Super User

My understanding (after reading the paper and Tom's explanation) is that PROC COMPARE calculates Diff  for each record as the difference between the two values, and DiffPct is that difference divided by the value in the Base dataset.   So it's the Diff and % DIff you see in the usual Value Comparison Results. 

 

Then the summary stats are summaries of those variables , Diff and DiffPct.

 

Below uses the PROC COMPARE example I posted, and uses a DATA step to calculated Diff and DiffPCT and PROC MEANS to calculate the summary statistics.  This is the simple case, where all rows match.  

 

data class ;
  set sashelp.class ;
  if name='Alfred' then height=70 ;
run ;

proc compare base=sashelp.class compare=class allstats ;
run ;

data want ;
  merge sashelp.class (keep=Name Height rename=(Height=Height_base))
        class (keep=Name Height rename=(Height=Height_comp))
  ;
  by name ;
  diff=Height_comp-Height_base ;
  diffpct=diff/Height_base ;
run ;

proc means data=want n mean std max min stderr t probt;
  var diff diffpct ;
run ;

I'm a big fan of PROC COMPARE, but have never used these stats.  I can see how N, MIN and MAX could be useful summary information.  Especially in the case of a big dataset that has lots of small differences due to numeric precision or whatever.

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 563 views
  • 5 likes
  • 4 in conversation