BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
daisy6
Quartz | Level 8

Hello SAS guys,

 

I usually use proc compare to compare two large data sets. I only care about the no equivalence, so I use outnoequal option. My question is how can I know the variable names in the output data set.

 

For example, the proc compare output data is the follows:

 

id    v1    v2............v30..............v50

I1    ..      .. .............  X .............  X

I2   .........................  X .............. X

I3

.......

......

I67894  ........

 

How can I know the variable V30 and V50 contians X which means different without opening the output dataset. 

 

Thank you very much

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You can use ODS commands to suppress the output.

ods exclude all;
ods output CompareSummary=summ;
proc compare data=class compare=sashelp.class listequalvar
 outnoequal out=dif
;
run;
ods exclude none;

Not sure why you posted a WORD document for TEXT.

 

Obs    type    batch

  1     h      The COMPARE Procedure
  2     h      Comparison of WORK.CLASS with SASHELP.CLASS
  3     h      (Method=EXACT)
  4     h
  5     h      Observation Summary
  6     h
  7     h      Observation      Base  Compare
  8     d
  9     d      First Obs           1        1
 10     d      First Unequal       1        1
 11     d      Last  Unequal       3        3
 12     d      Last  Obs          19       19
 13     d
 14     d      Number of Observations in Common: 19.
 15     d      Total Number of Observations Read from WORK.CLASS: 19.
 16     d      Total Number of Observations Read from SASHELP.CLASS: 19.
 17     d
 18     d      Number of Observations with Some Compared Variables Unequal: 3.
 19     d      Number of Observations with All Compared Variables Equal: 16.
 20     d
 21     d
 22     h      Values Comparison Summary
 23     h
 24     d      Number of Variables Compared with All Observations Equal: 2.
 25     d      Number of Variables Compared with Some Observations Unequal: 3.
 26     d      Number of Variables with Missing Value Differences: 1.
 27     d      Total Number of Values which Compare Unequal: 3.
 28     d      Maximum Difference: 12.
 29     d
 30     d
 31     h      Variables with All Equal Values
 32     h
 33     h      Variable  Type  Len
 34     d
 35     d      Sex       CHAR    1
 36     d      Height    NUM     8
 37     d
 38     h      Variables with Unequal Values
 39     h
 40     h      Variable  Type  Len  Ndif   MaxDif  MissDif
 41     d
 42     d      Name      CHAR    8     1                 0
 43     d      Age       NUM     8     1   12.000        0
 44     d      Weight    NUM     8     1        0        1
 45     d

View solution in original post

9 REPLIES 9
Tom
Super User Tom
Super User

Sounds like you actually don't want to generate that dataset.

Please explain more what you want.

The normal report shows which variables are different.

Also the procedure will set a macro variable that you can check individual bits to see what types of differences where found if you just want an overall YES/NO status check.

daisy6
Quartz | Level 8

I do need the compare output dataset, but I don't want to open it to check whether there are some discrepancy in individual variable or which variables have the discrepancy. I prefer to save the variable name as macro then export to excel file. When I open the excel file, I know the variables in two original datasets have discrepancies. 

Tom
Super User Tom
Super User

You can use the OUTNOEQUAL option on the PROC COMPARE statement to limit the observations to those with differences.

 

You can use the ODS output feature to get the summary of variable data which you could use to figure out which variables had any differences.

 

Here is an example:

data class;
 set sashelp.class;
 if _n_=1 then name='FRED';
 if _n_=2 then age=1;
 if _n_=3 then weight=.;
run;

ods output CompareSummary=summ;
proc compare data=class compare=sashelp.class listequalvar outnoequal out=dif;
run;

%put &=sysinfo -> %sysfunc(putn(&sysinfo,binary16));

data vars;
  length Name $32 Type $4 Len Ndif MaxDif MissDif 8;
  keep name -- missdif ;
  do p=1 by 1 until (found);
    set summ ;
    found = (batch=:'Variable ');
  end;
  if found then do p=p+1 to nobs ;
    set summ point=p nobs=nobs;
    if type='d' then do;
      name=scan(batch,1,' ');
      type=scan(batch,2,' ');
      len=input(scan(batch,3,' '),??32.);
      ndif=input(scan(batch,4,' '),??32.);
      maxdif=input(scan(batch,5,' '),??32.);
      missdif=input(scan(batch,6,' '),??32.);
      if not missing(name) then output;
    end;
  end;
  stop;
run;

proc print;
run;

Results:

                                        Max    Miss
Obs    Name      Type    Len    Ndif    Dif     Dif

 1     Sex       CHAR     1       .       .      .
 2     Height    NUM      8       .       .      .
 3     Name      CHAR     8       1       0      .
 4     Age       NUM      8       1      12      0
 5     Weight    NUM      8       1       0      1

You can then query that to get a list of variables to include in your report.

%let varlist=;
proc sql noprint;
  select name into :varlist separated by ' ' from vars where ndif;
quit;

proc print data=dif(keep=_type_ _obs_ &varlist) ;
run;

 

daisy6
Quartz | Level 8

Thank you for your reply. I tried running your code and what I got is the empty vars dataset. Is anything wrong with my SAS version? 

Tom
Super User Tom
Super User

Check the LOG to see if it generated the ODS output dataset.

If it did then look at the dataset generated (before the data step that tries to pick it apart) and see if it matches the format the data step is expecting.  Adjust the data step to read what you see in the result.  ODS output has nasty habit of being different based on the different results. For example if you remove the lines in my test program that modify the numeric variables then the report does not include the column with the MAXDIF variable. If there are no missing differences then that column does not exist.

daisy6
Quartz | Level 8

Hello Tom,

 

The log window shows summ dataset is created there are 46 observations and 2 variables, while the vars dataset has 0 observations. If the two datasets are small, your code : ods output CompareSummary=summ;
proc compare data=class compare=sashelp.class listequalvar outnoequal out=dif;
run;

 

may work well. However, for the large datasets with millions observations, I prefer noprint option. noprint option conflicts with listequalar, how do you solve it? 

 

I attached the capture of Summ dataset here, please tell me whether it is the same as yours. Thank you very much

Tom
Super User Tom
Super User

You can use ODS commands to suppress the output.

ods exclude all;
ods output CompareSummary=summ;
proc compare data=class compare=sashelp.class listequalvar
 outnoequal out=dif
;
run;
ods exclude none;

Not sure why you posted a WORD document for TEXT.

 

Obs    type    batch

  1     h      The COMPARE Procedure
  2     h      Comparison of WORK.CLASS with SASHELP.CLASS
  3     h      (Method=EXACT)
  4     h
  5     h      Observation Summary
  6     h
  7     h      Observation      Base  Compare
  8     d
  9     d      First Obs           1        1
 10     d      First Unequal       1        1
 11     d      Last  Unequal       3        3
 12     d      Last  Obs          19       19
 13     d
 14     d      Number of Observations in Common: 19.
 15     d      Total Number of Observations Read from WORK.CLASS: 19.
 16     d      Total Number of Observations Read from SASHELP.CLASS: 19.
 17     d
 18     d      Number of Observations with Some Compared Variables Unequal: 3.
 19     d      Number of Observations with All Compared Variables Equal: 16.
 20     d
 21     d
 22     h      Values Comparison Summary
 23     h
 24     d      Number of Variables Compared with All Observations Equal: 2.
 25     d      Number of Variables Compared with Some Observations Unequal: 3.
 26     d      Number of Variables with Missing Value Differences: 1.
 27     d      Total Number of Values which Compare Unequal: 3.
 28     d      Maximum Difference: 12.
 29     d
 30     d
 31     h      Variables with All Equal Values
 32     h
 33     h      Variable  Type  Len
 34     d
 35     d      Sex       CHAR    1
 36     d      Height    NUM     8
 37     d
 38     h      Variables with Unequal Values
 39     h
 40     h      Variable  Type  Len  Ndif   MaxDif  MissDif
 41     d
 42     d      Name      CHAR    8     1                 0
 43     d      Age       NUM     8     1   12.000        0
 44     d      Weight    NUM     8     1        0        1
 45     d
Tom
Super User Tom
Super User

Check the setting of the CENTER/NOCENTER option.  Perhaps that is what is confusing the data step that is trying to parse the ODS output dataset.

Try:

    found = (left(batch)=:'Variable ');
daisy6
Quartz | Level 8

It is the problem which the Vars is empty. Got it and appreciate your help. 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2496 views
  • 0 likes
  • 2 in conversation