BookmarkSubscribeRSS Feed
ljschroeder
Calcite | Level 5

How do I get more detail about the lines in BOLD? When I use option LISTOBS it will list the observations that it finds different. However when I view them I don't see any differences. How do I find out which vars are different when there are 95 vars?

 

(Method=RELATIVE(0.0000222), Criterion=1.0E-09)
Data Set Summary
Dataset Created Modified NVar NObs
ORDATA.ORACLETEMP 03MAY24:07:20:37 03MAY24:07:20:37 95 26553
SNOWDATA.SNOWTEMP 03MAY24:07:21:21 03MAY24:07:21:21 95 26553
Variables Summary
Number of Variables in Common: 95.
Number of ID Variables: 10.

Number of Observations in Common: 24450.
Number of Observations in ORDATA.ORACLETEMP but not in SNOWDATA.SNOWTEMP: 2103.
Number of Observations in SNOWDATA.SNOWTEMP but not in ORDATA.ORACLETEMP: 2103.
Total Number of Observations Read from ORDATA.ORACLETEMP: 26553.
Total Number of Observations Read from SNOWDATA.SNOWTEMP: 26553.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 24450.

 

Values Comparison Summary
Number of Variables Compared with All Observations Equal: 85.
Number of Variables Compared with Some Observations Unequal: 0.
Total Number of Values which Compare Unequal: 0.
Total Number of Values not EXACTLY Equal: 3628.
Maximum Difference Criterion Value: 2.2094E-16.

 

Here is my code:

title "ordata.&ordsn versus &snowdsn";
proc compare base=ordata.&ordsn
compare=snowdata.&snowdsn
Criterion=0.000000001 fuzz=.001
out=result listobs outbase outcomp outdif outnoequal maxprint=50;
id member medicaid_no claim_number line_number PAY_DT F_DOS L_DOS DOS AMT_REQ QTY
;
attrib _all_ label='';
format _all_;
informat _all_;
run;

3 REPLIES 3
Kurt_Bremser
Super User

It means that you have different values in at least one of your ID variables (consider that statement to be like a BY).

If there are numeric values that look identical, you may have small differences far down in the fractional part, caused by the usual numeric imprecision.

CRITERION does not influence the working of ID.

Quentin
Super User

For the ID statement, you only want to include the variables that are necessary to make a unique ID, i.e. one that will not have any duplicates.   From your variable names, I would guess:

id member medicaid_no claim_number line_number ;

and  maybe you don't need line_number if your data only has one record for each claim_number.

 

All of your values are being evaluated as equal:

Number of Variables Compared with All Observations Equal: 85.
Number of Variables Compared with Some Observations Unequal: 0.
Total Number of Values which Compare Unequal: 0.
Total Number of Values not EXACTLY Equal: 3628.
Maximum Difference Criterion Value: 2.2094E-16.

The biggest difference it finds is .00000000000000022094, which is just numerical precision noise.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Patrick
Opal | Level 21

Based on your results and what @Kurt_Bremser wrote there are no differences for the rows with matching ID's but there are 2103 rows that don't match over the ID variables highly likely caused by some very small differences in at least one of these ID variables.

 

You could first round numerical variables to some non-significant decimal place before the comparison using code as below.

data work.&ordsn;
  set ordata.&ordsn;
  array _numvars _numeric_;
  do over _numvars;
    _numvars=round(_numvars,0.00000000001);
  end;
run;
data work.&snowdsn;
  set snowdata.&snowdsn;
  array _numvars _numeric_;
  do over _numvars;
    _numvars=round(_numvars,0.00000000001);
  end;
run;

title "ordata.&ordsn versus &snowdsn";
proc compare base=work.&ordsn
  compare=work.&snowdsn
  Criterion=0.000000001 fuzz=.001
  out=result listobs outbase outcomp outdif outnoequal maxprint=50;
  id member medicaid_no claim_number line_number PAY_DT F_DOS L_DOS DOS AMT_REQ QTY
  ;
  attrib _all_ label='';
  format _all_;
  informat _all_;
run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 902 views
  • 4 likes
  • 4 in conversation