BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

Is there a way I can get PROC COMPARE to output a useful dataset of differences in variable attributes (type, length, format, label)?

 

You can use PROC COMPARE to compare variable attributes rather than data values, e.g.:

 

data class ;
  set sashelp.class (drop=name weight);
  length name $33 ;
  call missing(name) ;
  label sex='student sex' ;
  format height comma. ;
run ;

proc compare base=sashelp.class(obs=0) compare=class(obs=0) listvar;
run ;

You get a nice summary in the output:

Listing of Variables in SASHELP.CLASS but not in WORK.CLASS

                   Variable  Type  Length

                   Weight    Num        8


   Listing of Common Variables with Differing Attributes

 Variable  Dataset        Type  Length  Format  Label

 Name      SASHELP.CLASS  Char       8
           WORK.CLASS     Char      33
 Sex       SASHELP.CLASS  Char       1
           WORK.CLASS     Char       1          student sex
 Height    SASHELP.CLASS  Num        8
           WORK.CLASS     Num        8  COMMA.

Is there a way to get a reasonable output dataset with that information?

 

One would hope the answer would be to use ODS, but PROC COMPARE isn't that ODS-friendly.  I tried:

ods output CompareVariables=VarComp ;
proc compare base=sashelp.class(obs=0) compare=class(obs=0) listvar ;
run ;
ods output close ;

And the output VarComp dataset is ugly (for any use except reading):

Obs    type    batch

  1     d
  2     d
  3     h      Listing of Variables in SASHELP.CLASS but not in WORK.CLASS
  4     d
  5     h                         Variable  Type  Length
  6     h
  7     d                         Weight    Num        8
  8     d
  9     d
 10     h         Listing of Common Variables with Differing Attributes
 11     d
 12     h       Variable  Dataset        Type  Length  Format  Label
 13     h
 14     d       Name      SASHELP.CLASS  Char       8
 15     d                 WORK.CLASS     Char      33
 16     d       Sex       SASHELP.CLASS  Char       1
 17     d                 WORK.CLASS     Char       1          student sex
 18     d       Height    SASHELP.CLASS  Num        8
 19     d                 WORK.CLASS     Num        8  COMMA.

I realize I could just run PROC CONTENTS, output the variable metadata to a dataset and then compare that, something like:

 

proc contents data=sashelp.class out=basevars(keep=name type length label format) ;
run ;

proc contents data=class out=compvars(keep=name type length label format) ;
run ;

proc compare base=basevars compare=compvars out=want outnoequal outbase outcomp;
  id Name ;
run ;

proc print data=want ;
run ;

Which gives a more useful output format:

Obs    _TYPE_     _OBS_    NAME      TYPE    LENGTH       LABEL       FORMAT

 1     BASE         2      Height      1        8
 2     COMPARE      2      Height      1        8                     COMMA
 3     BASE         3      Name        2        8
 4     COMPARE      3      Name        2       33
 5     BASE         4      Sex         2        1
 6     COMPARE      4      Sex         2        1      student sex
 7     BASE         5      Weight      1        8

But I feel like since PROC COMPARE is comparing the variable attributes, there should be a way to get an output dataset from PROC COMPARE.

 

As a second question, I'm sure there are probably 100 utility macros for doing this sort of comparison of variable attributes. If you've got a favorite one, please share.

 

 

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

For just comparing attributes here is a 20 year old macro. 

It will comparing whole libraries (more than two libraries at once if you want).

Header:

%macro varcheck
/*---------------------------------------------------------------------
Check variable consistency across multiple libraries
---------------------------------------------------------------------*/
(librefs=    /* Space delimited list of defined LIBREFs */
,members=    /* Space delimited list of member names (blank is all) */
,xmembers=   /* Space delimited list of members to exclude */
,varnames=   /* Space delimited list of variables (blank is all) */
,xvarnames=  /* Space delimited list of variables to exclude */
,out=varcheck /* Output dataset name */
,report=0    /* Produce proc report of OUT dataset? */
,reportw=    /* Where clause for proc report */
 error1|error2|error3|error4|error5
);

Full macro:

Spoiler
%macro varcheck
/*---------------------------------------------------------------------
Check variable consistency across multiple libraries
---------------------------------------------------------------------*/
(librefs=    /* Space delimited list of defined LIBREFs */
,members=    /* Space delimited list of member names (blank is all) */
,xmembers=   /* Space delimited list of members to exclude */
,varnames=   /* Space delimited list of variables (blank is all) */
,xvarnames=  /* Space delimited list of variables to exclude */
,out=varcheck /* Output dataset name */
,report=0    /* Produce proc report of OUT dataset? */
,reportw=    /* Where clause for proc report */
 error1|error2|error3|error4|error5
);
/*----------------------------------------------------------------------
Revision information:
$Revision: 1.1 $ $Date: 2005/03/18 14:47:59 $
-----------------------------------------------------------------------
$Purpose: Check variable consistency across multiple libraries and
produce a summary dataset with error flags. It can also produce a report.

$Assumptions: Librefs are not checked for existence.
$Inputs: SASHELP.VCOLUMN and SASHELP.VSLIB internal SAS metadata views.
$Outputs: dataset and/or report
$Called by: user reporting programs
$Calls to: parmv.sas qlist.sas
-----------------------------------------------------------------------
$Usage notes:
The variable attributes from multiple libraries are compared for con-
sistency and a summary dataset is generated.  Optionally a report can
by produced using PROC REPORT.

The data is grouped by the MEMNAME and NAME.
  MEMNAME - Dataset name in uppercase.
  NAME    - Variable name in uppercase.

These variables will have the same values for all records with the
same MEMNAME/NAME group.

 *NEWLEN  - Maximum storage length over all libraries. When the type is
            inconsistent over the libraries then NEWLEN will always be
            character.
  NLIBREFS - Number of libraries that contain this variable.
 +ERROR1   - Indicates that type is inconsistent.
 +ERROR2   - Indicates that length is inconsistent.
 +ERROR3   - Indicates that format is inconsistent.
 +ERROR4   - Indicates that label is inconsistent, missing or too long.
 +ERROR5   - Indicates that case of NAME is inconsistent.

These variables will represent the value from the particular library.
  LIBNAME  - The libref used for this library (see PATH variable).
 *LENGTH   - Storage length for this variable.
 +TYPE_ERR - Type does not match overall type. (as stored in NEWLEN).
 +LEN_ERR  - Length does not match maximum length over all libraries.
 +FMT_ERR  - $nn. format applied to variable.
 +LAB_ERR  - Label is missing or longer than 40 characters.
 +NAME_ERR - Name is longer than 8 characters or not all uppercase.
  FORMAT   - Original format.
  INFORMAT - Original informat.
  ONAME    - Original NAME with case preserved.
  LABEL    - Original label.
  PATH     - The directory(s) for this LIBNAME from the SASHELP.VSLIB.

+ The indicator variables are boolean values (1=true 0=false).

* NEWLEN and LENGTH are character variables formatted to match the
syntax used by the LENGTH statement.  That is for a character variable
the length is prefixed with a $.

-----------------------------------------------------------------------

Use the REPORT parameter to indicate if you want to produce a report.
You can subset the records in the report by supplying a boolean
expression in the REPORTW parameter referencing any of the variables in
the output dataset.  By default all variables that have any of the
overall error flags set will be printed.  You can print all records by
setting REPORTW=1 or blank.

You can run a report on an existing dataset by setting OUT equal to the
name of the dataset and leaving LIBREFS blank.

If the combination of libraries, members, and variables requested do
not find any values in SASHELP.VCOLUMN then an error will be reported.

-----------------------------------------------------------------------
$Examples:
* Compare three libraries and list variables with conflicting type or length ;
 %varcheck(librefs=S0001 S0002 S0003,report=1,reportw=error1 or error2)

* Produce report of variables with format issues from existing dataset;
 %varcheck(report=1,out=varcheck,reportw=error3)

----------------------------------------------------------------------
Modification History
----------------------------------------------------------------------
$Log: varcheck.sas,v $

----------------------------------------------------------------------*/
%local macro parmerr;
%let macro=VARCHECK;

%parmv(librefs,_req=0,_words=1)
%parmv(out,_req=1)
%parmv(members,_words=1)
%parmv(xmembers,_words=1)
%parmv(varnames,_words=1)
%parmv(xvarnames,_words=1)
%parmv(report,_val=0 1)
%parmv(reportw,_case=n,_words=1,_def=1)

%if (&parmerr) %then %goto quit;
%if (%sysfunc(exist(&out)) and &report and ^%length(&librefs)) %then %do;
  %put Note: Generating report from existing &out dataset.;
  %goto report;
%end;
%else %if (^%length(&librefs)) %then %do;
  %parmv(librefs,_req=1)
  %goto quit;
%end;

*----------------------------------------------------------------------;
* Generate summary of selected variables with error flags ;
*----------------------------------------------------------------------;
proc sql ;
  create table &out as
    select
      upcase(memname) as memname label='Member name'
     ,upcase(name)    as name label='Variable name'
     ,case when min(type)='char' then
               '$'||compress(put(max(length),6.))
           else compress(put(max(length),6.))
      end as newlen length=7 label='Max type length'
     ,count(distinct c.libname) as nlibrefs label='Number of libraries'
     ,c.libname as libname label='Libref'
     ,(count(distinct type) ^= 1) as error1 label='Type Mismatch'
     ,(count(distinct length) ^= 1) as error2 label='Length Mismatch'
     ,((count(distinct format) + max(format=' ')) > 1)
        as error3 label='Format Mismatch'
     ,max(max(label=' '),count(distinct label) ^= 1
         ,max(length(label)) > 40)
        as error4 label='Label Too Long or Mismatch'
     ,(count(distinct name) ^= 1) as error5 label='Name case Mismatch'
     ,case when type='char' then
               '$'||compress(put(length,6.))
           else put(length,1.)
      end as length length=7 label='Type Length'
     ,(type ^= min(type)) as type_err label='Type error'
     ,(length ^= max(length)) as len_err label='Length error'
     ,(substr(format,1,1)='$' and compress(format,'$0123456789.')=' ')
        as fmt_err label='Format error'
     ,((label=' ') or (length(label)>40))
        as lab_err label='Missing or too long label'
     ,(length(name)>8 or (name ne upcase(name)))
        as name_err label='Name error'
     ,format label='Original format'
     ,informat label='Original informat'
     ,name as oname label='Original NAME (case preserved)'
     ,label label='Original label'
     ,l.path label='Libref pathname'
    from sashelp.vcolumn c, sashelp.vslib l
    where c.libname in %qlist(&librefs)
      and c.libname = l.libname
%if (%length(&members)) %then
      and memname in %qlist(&members)
;
%if (%length(&xmembers)) %then
      and memname not in %qlist(&xmembers)
;
%if (%length(&varnames)) %then
      and upcase(name) in %qlist(&varnames)
;
%if (%length(&xvarnames)) %then
      and upcase(name) in %qlist(&xvarnames)
;
    group by 1,2
    order by 1,2,3,4,5
  ;

quit;
%if (^&sqlobs) %then %do;
  %parmv(_msg=No variables selected)
  %goto quit;
%end;

%report:
%if (&report) %then %do;
*----------------------------------------------------------------------;
* Report of variable status ;
*----------------------------------------------------------------------;

proc report data=&out headline missing split='FF'X spacing=1 nofs;
 where &reportw;
 column
   ('- OVERALL -'
    memname name newlen
    ('-ERROR-'
     error1 error2 error3 error4 error5
    )
   )
   ('-LIBRARY-'
    nlibrefs
    libname
   )
   ('- INDIVIDUAL DATASET LEVEL -'
    ('-ERROR-' type_err len_err fmt_err lab_err name_err)
    length format informat oname label
   )
  ;
  define memname / order width=8 flow spacing=0 'MEMBER NAME';
  define name    / order width=8 flow 'VARIABLE NAME' ;
  define newlen  / order width=6 'TYPE/ LENGTH';
  define error1 / order width=1 'T';
  define error2 / order width=1 'L';
  define error3 / order width=1 'F';
  define error4 / order width=1 'D';
  define error5 / order width=1 'N';
  define nlibrefs / order width=3 '#';
  define libname / display width=8 'LIBNAME';
  define length / display width=6 'TYPE/ LENGTH';
  define format / display width=12 flow 'FORMAT';
  define informat / display width=12 flow 'INFORMAT';
  define type_err / display width=1 'T';
  define len_err  / display width=1 'L';
  define fmt_err  / display width=1 'F';
  define name_err  / display width=1 'N';
  define lab_err  / display width=1 'D';
  define oname    / display width=8 flow 'NAME';
  define label    / display width=40 flow 'LABEL';
  break after memname /page;
  break after name / skip;

*footnote1 'T=Type L=Length F=Format D=Label N=Name case';
run;

%end;
%quit:
%mend varcheck;

You can get a copy of %PARMV() from here: https://github.com/sasutils/macros/blob/master/parmv.sas 

But you can probably just comment out the parameter checking. Just make sure to pass in valid values (and in the proper case)!

You can get %QLIST() from here: https://github.com/sasutils/macros/blob/master/qlist.sas 

But you can probably just change that part of the code to use FINDW() instead.  There are probably other parts that could also use more modern SAS functions, like the CATS() function.

 

You might need to update the reporting aspect if your variable names are longer than 8 characters.

 

 

View solution in original post

10 REPLIES 10
Quentin
Super User

For a non-compare approach, maybe just two PROC CONTENTS steps, then a data step to use by-group processing to compare the values, e.g.:

 

proc contents data=sashelp.class out=basevars(keep=libname memname name type length label format formatl formatd) noprint ;
run ;

proc contents data=class out=compvars(keep=libname memname name type length label format formatl formatd) noprint ;
run ;

data varinfo ;
  set basevars compvars ;
  by name type length label format formatl formatd ;
  if first.Formatd and last.Formatd ; *keep any records that are unique;
run ;

proc print data=varinfo ;
by name ;
id name ;
run ;

Returns:

NAME      LIBNAME    MEMNAME    TYPE    LENGTH       LABEL       FORMAT    FORMATL    FORMATD

Height    SASHELP     CLASS       1        8                                  0          0
          WORK        CLASS       1        8                     COMMA        0          0


Name      SASHELP     CLASS       2        8                                  0          0
          WORK        CLASS       2       33                                  0          0


Sex       SASHELP     CLASS       2        1                                  0          0
          WORK        CLASS       2        1      student sex                 0          0


Weight    SASHELP     CLASS       1        8                                  0          0
The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Tom
Super User Tom
Super User

For just comparing attributes here is a 20 year old macro. 

It will comparing whole libraries (more than two libraries at once if you want).

Header:

%macro varcheck
/*---------------------------------------------------------------------
Check variable consistency across multiple libraries
---------------------------------------------------------------------*/
(librefs=    /* Space delimited list of defined LIBREFs */
,members=    /* Space delimited list of member names (blank is all) */
,xmembers=   /* Space delimited list of members to exclude */
,varnames=   /* Space delimited list of variables (blank is all) */
,xvarnames=  /* Space delimited list of variables to exclude */
,out=varcheck /* Output dataset name */
,report=0    /* Produce proc report of OUT dataset? */
,reportw=    /* Where clause for proc report */
 error1|error2|error3|error4|error5
);

Full macro:

Spoiler
%macro varcheck
/*---------------------------------------------------------------------
Check variable consistency across multiple libraries
---------------------------------------------------------------------*/
(librefs=    /* Space delimited list of defined LIBREFs */
,members=    /* Space delimited list of member names (blank is all) */
,xmembers=   /* Space delimited list of members to exclude */
,varnames=   /* Space delimited list of variables (blank is all) */
,xvarnames=  /* Space delimited list of variables to exclude */
,out=varcheck /* Output dataset name */
,report=0    /* Produce proc report of OUT dataset? */
,reportw=    /* Where clause for proc report */
 error1|error2|error3|error4|error5
);
/*----------------------------------------------------------------------
Revision information:
$Revision: 1.1 $ $Date: 2005/03/18 14:47:59 $
-----------------------------------------------------------------------
$Purpose: Check variable consistency across multiple libraries and
produce a summary dataset with error flags. It can also produce a report.

$Assumptions: Librefs are not checked for existence.
$Inputs: SASHELP.VCOLUMN and SASHELP.VSLIB internal SAS metadata views.
$Outputs: dataset and/or report
$Called by: user reporting programs
$Calls to: parmv.sas qlist.sas
-----------------------------------------------------------------------
$Usage notes:
The variable attributes from multiple libraries are compared for con-
sistency and a summary dataset is generated.  Optionally a report can
by produced using PROC REPORT.

The data is grouped by the MEMNAME and NAME.
  MEMNAME - Dataset name in uppercase.
  NAME    - Variable name in uppercase.

These variables will have the same values for all records with the
same MEMNAME/NAME group.

 *NEWLEN  - Maximum storage length over all libraries. When the type is
            inconsistent over the libraries then NEWLEN will always be
            character.
  NLIBREFS - Number of libraries that contain this variable.
 +ERROR1   - Indicates that type is inconsistent.
 +ERROR2   - Indicates that length is inconsistent.
 +ERROR3   - Indicates that format is inconsistent.
 +ERROR4   - Indicates that label is inconsistent, missing or too long.
 +ERROR5   - Indicates that case of NAME is inconsistent.

These variables will represent the value from the particular library.
  LIBNAME  - The libref used for this library (see PATH variable).
 *LENGTH   - Storage length for this variable.
 +TYPE_ERR - Type does not match overall type. (as stored in NEWLEN).
 +LEN_ERR  - Length does not match maximum length over all libraries.
 +FMT_ERR  - $nn. format applied to variable.
 +LAB_ERR  - Label is missing or longer than 40 characters.
 +NAME_ERR - Name is longer than 8 characters or not all uppercase.
  FORMAT   - Original format.
  INFORMAT - Original informat.
  ONAME    - Original NAME with case preserved.
  LABEL    - Original label.
  PATH     - The directory(s) for this LIBNAME from the SASHELP.VSLIB.

+ The indicator variables are boolean values (1=true 0=false).

* NEWLEN and LENGTH are character variables formatted to match the
syntax used by the LENGTH statement.  That is for a character variable
the length is prefixed with a $.

-----------------------------------------------------------------------

Use the REPORT parameter to indicate if you want to produce a report.
You can subset the records in the report by supplying a boolean
expression in the REPORTW parameter referencing any of the variables in
the output dataset.  By default all variables that have any of the
overall error flags set will be printed.  You can print all records by
setting REPORTW=1 or blank.

You can run a report on an existing dataset by setting OUT equal to the
name of the dataset and leaving LIBREFS blank.

If the combination of libraries, members, and variables requested do
not find any values in SASHELP.VCOLUMN then an error will be reported.

-----------------------------------------------------------------------
$Examples:
* Compare three libraries and list variables with conflicting type or length ;
 %varcheck(librefs=S0001 S0002 S0003,report=1,reportw=error1 or error2)

* Produce report of variables with format issues from existing dataset;
 %varcheck(report=1,out=varcheck,reportw=error3)

----------------------------------------------------------------------
Modification History
----------------------------------------------------------------------
$Log: varcheck.sas,v $

----------------------------------------------------------------------*/
%local macro parmerr;
%let macro=VARCHECK;

%parmv(librefs,_req=0,_words=1)
%parmv(out,_req=1)
%parmv(members,_words=1)
%parmv(xmembers,_words=1)
%parmv(varnames,_words=1)
%parmv(xvarnames,_words=1)
%parmv(report,_val=0 1)
%parmv(reportw,_case=n,_words=1,_def=1)

%if (&parmerr) %then %goto quit;
%if (%sysfunc(exist(&out)) and &report and ^%length(&librefs)) %then %do;
  %put Note: Generating report from existing &out dataset.;
  %goto report;
%end;
%else %if (^%length(&librefs)) %then %do;
  %parmv(librefs,_req=1)
  %goto quit;
%end;

*----------------------------------------------------------------------;
* Generate summary of selected variables with error flags ;
*----------------------------------------------------------------------;
proc sql ;
  create table &out as
    select
      upcase(memname) as memname label='Member name'
     ,upcase(name)    as name label='Variable name'
     ,case when min(type)='char' then
               '$'||compress(put(max(length),6.))
           else compress(put(max(length),6.))
      end as newlen length=7 label='Max type length'
     ,count(distinct c.libname) as nlibrefs label='Number of libraries'
     ,c.libname as libname label='Libref'
     ,(count(distinct type) ^= 1) as error1 label='Type Mismatch'
     ,(count(distinct length) ^= 1) as error2 label='Length Mismatch'
     ,((count(distinct format) + max(format=' ')) > 1)
        as error3 label='Format Mismatch'
     ,max(max(label=' '),count(distinct label) ^= 1
         ,max(length(label)) > 40)
        as error4 label='Label Too Long or Mismatch'
     ,(count(distinct name) ^= 1) as error5 label='Name case Mismatch'
     ,case when type='char' then
               '$'||compress(put(length,6.))
           else put(length,1.)
      end as length length=7 label='Type Length'
     ,(type ^= min(type)) as type_err label='Type error'
     ,(length ^= max(length)) as len_err label='Length error'
     ,(substr(format,1,1)='$' and compress(format,'$0123456789.')=' ')
        as fmt_err label='Format error'
     ,((label=' ') or (length(label)>40))
        as lab_err label='Missing or too long label'
     ,(length(name)>8 or (name ne upcase(name)))
        as name_err label='Name error'
     ,format label='Original format'
     ,informat label='Original informat'
     ,name as oname label='Original NAME (case preserved)'
     ,label label='Original label'
     ,l.path label='Libref pathname'
    from sashelp.vcolumn c, sashelp.vslib l
    where c.libname in %qlist(&librefs)
      and c.libname = l.libname
%if (%length(&members)) %then
      and memname in %qlist(&members)
;
%if (%length(&xmembers)) %then
      and memname not in %qlist(&xmembers)
;
%if (%length(&varnames)) %then
      and upcase(name) in %qlist(&varnames)
;
%if (%length(&xvarnames)) %then
      and upcase(name) in %qlist(&xvarnames)
;
    group by 1,2
    order by 1,2,3,4,5
  ;

quit;
%if (^&sqlobs) %then %do;
  %parmv(_msg=No variables selected)
  %goto quit;
%end;

%report:
%if (&report) %then %do;
*----------------------------------------------------------------------;
* Report of variable status ;
*----------------------------------------------------------------------;

proc report data=&out headline missing split='FF'X spacing=1 nofs;
 where &reportw;
 column
   ('- OVERALL -'
    memname name newlen
    ('-ERROR-'
     error1 error2 error3 error4 error5
    )
   )
   ('-LIBRARY-'
    nlibrefs
    libname
   )
   ('- INDIVIDUAL DATASET LEVEL -'
    ('-ERROR-' type_err len_err fmt_err lab_err name_err)
    length format informat oname label
   )
  ;
  define memname / order width=8 flow spacing=0 'MEMBER NAME';
  define name    / order width=8 flow 'VARIABLE NAME' ;
  define newlen  / order width=6 'TYPE/ LENGTH';
  define error1 / order width=1 'T';
  define error2 / order width=1 'L';
  define error3 / order width=1 'F';
  define error4 / order width=1 'D';
  define error5 / order width=1 'N';
  define nlibrefs / order width=3 '#';
  define libname / display width=8 'LIBNAME';
  define length / display width=6 'TYPE/ LENGTH';
  define format / display width=12 flow 'FORMAT';
  define informat / display width=12 flow 'INFORMAT';
  define type_err / display width=1 'T';
  define len_err  / display width=1 'L';
  define fmt_err  / display width=1 'F';
  define name_err  / display width=1 'N';
  define lab_err  / display width=1 'D';
  define oname    / display width=8 flow 'NAME';
  define label    / display width=40 flow 'LABEL';
  break after memname /page;
  break after name / skip;

*footnote1 'T=Type L=Length F=Format D=Label N=Name case';
run;

%end;
%quit:
%mend varcheck;

You can get a copy of %PARMV() from here: https://github.com/sasutils/macros/blob/master/parmv.sas 

But you can probably just comment out the parameter checking. Just make sure to pass in valid values (and in the proper case)!

You can get %QLIST() from here: https://github.com/sasutils/macros/blob/master/qlist.sas 

But you can probably just change that part of the code to use FINDW() instead.  There are probably other parts that could also use more modern SAS functions, like the CATS() function.

 

You might need to update the reporting aspect if your variable names are longer than 8 characters.

 

 

Quentin
Super User

Thanks Tom for sharing your macro.  As nobody has responded with a way to get what I want out of PROC COMPARE, I will assume the answer to my main question is "nope, can't get that (easily) from PROC COMPARE."  Thus accepting your answer as an alternative approach in a utility macro, which was part 2 of my question.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
SASKiwi
PROC Star

@Quentin  - I appreciate the post and concur regarding COMPARE's rudimentary and archaic reporting capabilities. I use COMPARE a lot, particularly for application testing and for operational data checking. I find it invaluable but its reporting capabilities are frustratingly limited and have not seen any major improvements in a long time.

Quentin
Super User

Thanks for the concurrence, @SASKiwi .  I'm working on a SAS ballot item to improve the ODS OUTPUT friendliness from PROC COMPARE.  I'm not optimistic it will happen, but it's worth submitting.  I truly believe PROC COMPARE is a pearl of the SAS language.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
SASKiwi
PROC Star

@Quentin  - "I truly believe PROC COMPARE is a pearl of the SAS language."

 

Great quote and so true! You have my full support, and please advise when you have added the SASware Ballot item so I can add my two cents worth.

Quentin
Super User

I'm still having problems getting the ballot item posted.  It keeps erroring and then when I resubmit gets flagged as a duplicate.  I emailed the moderators for help.  Here's what I'm planning to post.

----

 

While PROC COMPARE technically supports ODS OUTPUT, the actual data output is basically just lines of text, which would need to be parsed in order to extract values from the text.  Suggest to have PROC COMPARE produce useful ODS OUTPUT datasets with values stored in meaningful variables.

 

This code:

 

data class ;
  set sashelp.class(drop=weight) ;
  if name='Mary' then height=33 ;
run ;

ods trace on ;
ods output CompareDatasets=CompareDatasets CompareSummary=CompareSummary CompareDifferences=CompareDifferences ;
proc compare base=sashelp.class compare=class ;
run ;
ods output close ;
ods trace off ;


produces three output datasets, all of them just free text:

 

                                      CompareDatasets

Obs    type    batch

  1     h                                  The COMPARE Procedure
  2     h                       Comparison of SASHELP.CLASS with WORK.CLASS
  3     h                                      (Method=EXACT)
  4     h
  5     h                                    Data Set Summary
  6     h
  7     h      Dataset                 Created          Modified  NVar    NObs  Label
  8     d
  9     d      SASHELP.CLASS  05AUG20:21:16:10  05AUG20:21:16:10     5      19  Student Data
 10     d      WORK.CLASS     05NOV22:18:58:12  05NOV22:18:58:12     4      19
 11     d
 12     d
 13     h                                    Variables Summary
 14     h
 15     d              Number of Variables in Common: 4.
 16     d              Number of Variables in SASHELP.CLASS but not in WORK.CLASS: 1.


                                CompareSummary

Obs    type                                 batch

  1     d
  2     d
  3     h                            Observation Summary
  4     h
  5     h                       Observation      Base  Compare
  6     d
  7     d                       First Obs           1        1
  8     d                       First Unequal      14       14
  9     d                       Last  Unequal      14       14
 10     d                       Last  Obs          19       19
 11     d
 12     d      Number of Observations in Common: 19.
 13     d      Total Number of Observations Read from SASHELP.CLASS: 19.
 14     d      Total Number of Observations Read from WORK.CLASS: 19.
 15     d
 16     d      Number of Observations with Some Compared Variables Unequal: 1.
 17     d      Number of Observations with All Compared Variables Equal: 18.
 18     d
 19     d
 20     h                         Values Comparison Summary
 21     h
 22     d      Number of Variables Compared with All Observations Equal: 3.
 23     d      Number of Variables Compared with Some Observations Unequal: 1.
 24     d      Total Number of Values which Compare Unequal: 1.
 25     d      Maximum Difference: 33.5.
 26     d
 27     h                           The COMPARE Procedure
 28     h                Comparison of SASHELP.CLASS with WORK.CLASS
 29     h                               (Method=EXACT)
 30     h
 31     h                       Variables with Unequal Values
 32     h
 33     h                    Variable  Type  Len  Ndif   MaxDif
 34     d
 35     d                    Height    NUM     8     1   33.500
 36     d


                            CompareDifferences

Obs    type    batch

  1     d
  2     d
  3     h                Value Comparison Results for Variables
  4     h
  5     d      __________________________________________________________
  6     d                 ||       Base    Compare
  7     h             Obs ||     Height     Height      Diff.     % Diff
  8     d       ________  ||  _________  _________  _________  _________
  9     d                 ||
 10     d             14  ||    66.5000    33.0000   -33.5000   -50.3759
 11     d      __________________________________________________________


Consider the CompareDifferences table. This would be much for useful it had variables Obs, Variable, BaseValue, CompareValue, Difference.

 

The current output tables are basically as cumbersome as the pre-ODS approach to reading values from output, where you had to use PROC PRINTTO to send the results to a .lst file, and then input that .lst file as data and parse out the values you want.

 

PROC COMPARE is one of my favorite PROCS, but the lack of useful output datasets is a big weakness.

 

(Yes, I realize PROC COMPARE has an out= option, but I don't love that format either : )

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
SASKiwi
PROC Star

I get exactly the same output with your COMPARE test, so good luck on the SASware Ballot item.

Quentin
Super User

With some help from the moderators, the ballot item is posted:

https://communities.sas.com/t5/SASware-Ballot-Ideas/Enhance-PROC-COMPARE-ODS-Output-data-sets/idi-p/...

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Reeza
Super User
I could have sworn I had one in there on this exact topic before but cannot find it. Also think it predates this forum though 😄

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 4003 views
  • 12 likes
  • 4 in conversation