DATA Step, Macro, Functions and more

PROC COMPARE

Reply
New Contributor
Posts: 4

PROC COMPARE

Hi All,

 

I have one .sas7bdat having some variables, one of the variable(LBSTNRLO) format is 13.7, and we have loaded this file to oracle database and extracted it, now same variable(LBSTNRLO) format is now 8. I have compared these two datasets and got mismatch between these files which is expected.

 

I have removed formats of all the variables and compared it, i can see same value in both files, but having mismatch in compare output.

 

Even i have given format 13.7 for LBSTNRLO in extracted file and compared it but still it has mismatch. Can you please suggest how can we get a match in compare output.

 

eg: LBSTNRLO  - 12.100000  - in original file having 13.7 format.

     LBSTNRLO  - 12    in extracted file having 8. format.

  

After i remove the formats in both files i have 12.100000. but still have mismatch.

    

 

Thank you.

Respected Advisor
Posts: 3,777

Re: PROC COMPARE

You might want to consult the documentation.

 

METHOD=ABSOLUTE | EXACT | PERCENT | RELATIVE<(δ)>

specifies the method for judging the equality of numeric values. The constant δ (delta) is a number between 0 and 1 that specifies a value to add to the denominator when calculating the equality measure. By default, δ is 0.

Unless you use the CRITERION= option, the default method is EXACT. If you use the CRITERION= option, then the default method is RELATIVE(φ), where φ (phi) is a small number that depends on the numerical precision of the computer on which SAS is running and on the value of CRITERION=.

Trusted Advisor
Posts: 1,115

Re: PROC COMPARE

Hi @Sandyz,

 

Using the METHOD= (and CRITERION=) options of the PROC COMPARE statement with suitable settings (depending on your data), as suggested by data_null__, is indeed the standard way to obtain a "clean" PROC COMPARE result in the presence of irrelevant differences like 1.776E-15.


However, those differences (due to numeric representation issues) remain in your data and I would be concerned if I was aware of such inaccuracies in my data. Having numeric values which are different from what they seem to be (as you describe, "i can see same value in both files, but having mismatch in compare output") is a risk. Please see the example below and what it writes to the log (at least on Windows systems):

data demo;
x=12.3-0.2;
y=12.1;
if x>12.1 then put '###### Surprised? ######';
d=y-x;
run;

data _null_;
set demo;
put x= 32.29   /  /* naive attempt to see what's in X */
    x= best32. /  /* another naive attempt */
    x= hex16.  /  /* correct way to see what's in X */
    y= hex16.  /  /* correct way to see what's in Y */
    '                 ^' /
    'Please note the difference in the last bit.' /
    d=;
run;

 

Formats can be used to display the content of a numeric variable, but

  1. with few exceptions (like HEX16. or BINARY64.) they don't let you see the "true" value
  2. they don't change the value stored in the variable and only this value is relevant when it comes to comparisons or calculations.

So, if you have values with a small or moderate number of decimals (like 12.1 or 0.2345) and due to the Oracle transfer they are at risk of being (slightly) different from what they look like, I would recommend to clean your data. This means to apply the ROUND function with appropriate rounding units (data dependent, 1E-9 might be suitable) to the affected numeric variables.

Ask a Question
Discussion stats
  • 2 replies
  • 226 views
  • 0 likes
  • 3 in conversation