BookmarkSubscribeRSS Feed
ljschroeder
Calcite | Level 5

How do I get more detail about the lines in BOLD? When I use option LISTOBS it will list the observations that it finds different. However when I view them I don't see any differences. How do I find out which vars are different when there are 95 vars?

 

(Method=RELATIVE(0.0000222), Criterion=1.0E-09)
Data Set Summary
Dataset Created Modified NVar NObs
ORDATA.ORACLETEMP 03MAY24:07:20:37 03MAY24:07:20:37 95 26553
SNOWDATA.SNOWTEMP 03MAY24:07:21:21 03MAY24:07:21:21 95 26553
Variables Summary
Number of Variables in Common: 95.
Number of ID Variables: 10.

Number of Observations in Common: 24450.
Number of Observations in ORDATA.ORACLETEMP but not in SNOWDATA.SNOWTEMP: 2103.
Number of Observations in SNOWDATA.SNOWTEMP but not in ORDATA.ORACLETEMP: 2103.
Total Number of Observations Read from ORDATA.ORACLETEMP: 26553.
Total Number of Observations Read from SNOWDATA.SNOWTEMP: 26553.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 24450.

 

Values Comparison Summary
Number of Variables Compared with All Observations Equal: 85.
Number of Variables Compared with Some Observations Unequal: 0.
Total Number of Values which Compare Unequal: 0.
Total Number of Values not EXACTLY Equal: 3628.
Maximum Difference Criterion Value: 2.2094E-16.

 

Here is my code:

title "ordata.&ordsn versus &snowdsn";
proc compare base=ordata.&ordsn
compare=snowdata.&snowdsn
Criterion=0.000000001 fuzz=.001
out=result listobs outbase outcomp outdif outnoequal maxprint=50;
id member medicaid_no claim_number line_number PAY_DT F_DOS L_DOS DOS AMT_REQ QTY
;
attrib _all_ label='';
format _all_;
informat _all_;
run;

3 REPLIES 3
Kurt_Bremser
Super User

It means that you have different values in at least one of your ID variables (consider that statement to be like a BY).

If there are numeric values that look identical, you may have small differences far down in the fractional part, caused by the usual numeric imprecision.

CRITERION does not influence the working of ID.

Quentin
Super User

For the ID statement, you only want to include the variables that are necessary to make a unique ID, i.e. one that will not have any duplicates.   From your variable names, I would guess:

id member medicaid_no claim_number line_number ;

and  maybe you don't need line_number if your data only has one record for each claim_number.

 

All of your values are being evaluated as equal:

Number of Variables Compared with All Observations Equal: 85.
Number of Variables Compared with Some Observations Unequal: 0.
Total Number of Values which Compare Unequal: 0.
Total Number of Values not EXACTLY Equal: 3628.
Maximum Difference Criterion Value: 2.2094E-16.

The biggest difference it finds is .00000000000000022094, which is just numerical precision noise.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Patrick
Opal | Level 21

Based on your results and what @Kurt_Bremser wrote there are no differences for the rows with matching ID's but there are 2103 rows that don't match over the ID variables highly likely caused by some very small differences in at least one of these ID variables.

 

You could first round numerical variables to some non-significant decimal place before the comparison using code as below.

data work.&ordsn;
  set ordata.&ordsn;
  array _numvars _numeric_;
  do over _numvars;
    _numvars=round(_numvars,0.00000000001);
  end;
run;
data work.&snowdsn;
  set snowdata.&snowdsn;
  array _numvars _numeric_;
  do over _numvars;
    _numvars=round(_numvars,0.00000000001);
  end;
run;

title "ordata.&ordsn versus &snowdsn";
proc compare base=work.&ordsn
  compare=work.&snowdsn
  Criterion=0.000000001 fuzz=.001
  out=result listobs outbase outcomp outdif outnoequal maxprint=50;
  id member medicaid_no claim_number line_number PAY_DT F_DOS L_DOS DOS AMT_REQ QTY
  ;
  attrib _all_ label='';
  format _all_;
  informat _all_;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 191 views
  • 4 likes
  • 4 in conversation