BookmarkSubscribeRSS Feed

While PROC COMPARE technically supports ODS OUTPUT, the actual data output is basically just lines of text, which would need to be parsed in order to extract values from the text.  Suggest to have PROC COMPARE produce useful ODS OUTPUT datasets with values stored in meaningful variables.

 

This code:

 

data class ;
  set sashelp.class(drop=weight) ;
  if name='Mary' then height=33 ;
run ;

ods trace on ;
ods output CompareDatasets=CompareDatasets CompareSummary=CompareSummary CompareDifferences=CompareDifferences ;
proc compare base=sashelp.class compare=class ;
run ;
ods output close ;
ods trace off ;


produces three output datasets, all of them just free text:

 

                                      CompareDatasets

Obs    type    batch

  1     h                                  The COMPARE Procedure
  2     h                       Comparison of SASHELP.CLASS with WORK.CLASS
  3     h                                      (Method=EXACT)
  4     h
  5     h                                    Data Set Summary
  6     h
  7     h      Dataset                 Created          Modified  NVar    NObs  Label
  8     d
  9     d      SASHELP.CLASS  05AUG20:21:16:10  05AUG20:21:16:10     5      19  Student Data
 10     d      WORK.CLASS     05NOV22:18:58:12  05NOV22:18:58:12     4      19
 11     d
 12     d
 13     h                                    Variables Summary
 14     h
 15     d              Number of Variables in Common: 4.
 16     d              Number of Variables in SASHELP.CLASS but not in WORK.CLASS: 1.


                                CompareSummary

Obs    type                                 batch

  1     d
  2     d
  3     h                            Observation Summary
  4     h
  5     h                       Observation      Base  Compare
  6     d
  7     d                       First Obs           1        1
  8     d                       First Unequal      14       14
  9     d                       Last  Unequal      14       14
 10     d                       Last  Obs          19       19
 11     d
 12     d      Number of Observations in Common: 19.
 13     d      Total Number of Observations Read from SASHELP.CLASS: 19.
 14     d      Total Number of Observations Read from WORK.CLASS: 19.
 15     d
 16     d      Number of Observations with Some Compared Variables Unequal: 1.
 17     d      Number of Observations with All Compared Variables Equal: 18.
 18     d
 19     d
 20     h                         Values Comparison Summary
 21     h
 22     d      Number of Variables Compared with All Observations Equal: 3.
 23     d      Number of Variables Compared with Some Observations Unequal: 1.
 24     d      Total Number of Values which Compare Unequal: 1.
 25     d      Maximum Difference: 33.5.
 26     d
 27     h                           The COMPARE Procedure
 28     h                Comparison of SASHELP.CLASS with WORK.CLASS
 29     h                               (Method=EXACT)
 30     h
 31     h                       Variables with Unequal Values
 32     h
 33     h                    Variable  Type  Len  Ndif   MaxDif
 34     d
 35     d                    Height    NUM     8     1   33.500
 36     d


                            CompareDifferences

Obs    type    batch

  1     d
  2     d
  3     h                Value Comparison Results for Variables
  4     h
  5     d      __________________________________________________________
  6     d                 ||       Base    Compare
  7     h             Obs ||     Height     Height      Diff.     % Diff
  8     d       ________  ||  _________  _________  _________  _________
  9     d                 ||
 10     d             14  ||    66.5000    33.0000   -33.5000   -50.3759
 11     d      __________________________________________________________


Consider the CompareDifferences table. This would be much for useful it had variables Obs, Variable, BaseValue, CompareValue, Difference.

 

The current output tables are basically as cumbersome as the pre-ODS approach to reading values from output, where you had to use PROC PRINTTO to send the results to a .lst file, and then input that .lst file as data and parse out the values you want.

 

PROC COMPARE is one of my favorite PROCS, but the lack of useful output datasets is a big weakness.

 

(Yes, I realize PROC COMPARE has an out= option, but I don't love that format either : )

2 Comments
SASKiwi
PROC Star

This is a great idea and has my full support.

Harry
Obsidian | Level 7

How about expanding OUTSTATS to include N and NDIF for character columns?  Please see https://communities.sas.com/t5/SASware-Ballot-Ideas/include-character-variables-in-PROC-COMPARE-OUTS...