Help using Base SAS procedures

Need help with proc compare

Accepted Solution Solved
Reply
Contributor
Posts: 68
Accepted Solution

Need help with proc compare

Hi all

I want to compare two large dasets (over 3 millions of rows) and I am introduced myselft to proc compare.

I tell you what are I Doing to run the code.

First I run a proc sort on both datasets to order the variables. After this I change the name and format of the compare dataset for having the same structure.

and the last step

proc compare base = a compare=b /* MAXPRINT=total*/

out= result outnoequal outbase outcomp outdif outpercent transpose;

var var1................varn;

run;

This is a sample of the result I have

 

_OBS_1=76637 _OBS_2=76637:

Variable Valor base Compare Diff. % Diff

var1 3039.07 3039.07 -4.54747E-13 -1.49634E-14

var2 3039.07 3039.07 -4.54747E-13 -1.49634E-14

and as this sample over 50,000 more

what do I doing wrong?, how can I fix this problem?

Thanks


Accepted Solutions
Solution
‎06-17-2014 10:41 AM
Super User
Posts: 11,343

Re: Need help with proc compare

Posted in reply to SergioSanchez

If the fact that your variables for some reason differ by a minute amount but will not affect your other uses you can use the FUZZ option on the Proc Compare statement to only report inequalities greater than the value you set.
For example:

proc compare base = a compare=b fuzz= 0.00001

out= result outnoequal outbase outcomp outdif outpercent transpose;

<remainder of compare code here>

will only report an inequaly greater than 0.00001 (or 1 E-5). Given your example a fuzz of 0.00000001 would probably remove most of the inequality messages.

View solution in original post


All Replies
Super User
Super User
Posts: 7,981

Re: Need help with proc compare

Posted in reply to SergioSanchez

Sorry, what is the question?  The output from proc compare is not always easy to read and so may require some post-processing.  Also, if you are comparing data of that size you may want to consider a bit more what you want to compare, for instance can you group the data, maybe summarise, of do by group joins to identify certain fall outs.  There are other methodologies also, some of the stats procedures will give you counts, outliers.  Distinct lists for codes, defined lists.  Maybe if you have a visit schedule, you could pull that out as a tabulate etc.

Contributor
Posts: 68

Re: Need help with proc compare

I know both variables are the same so I'd like to ajust the proc to obtain equalities in variable like the sample. also I have tested with proc means but I have the same problem plus the outpercent option don't work for me because I don't how to change de format.

What are the other methods I can use to compare two datasets?

base                    25451159.00

compare              25451275.42

diff                       116.42

percent                 0.00

thanks in advanced

Super User
Super User
Posts: 7,981

Re: Need help with proc compare

Posted in reply to SergioSanchez

It looks like you are running into the display versus storage issue.  You may see in your datasets:

3039.07     and     3039.07

Which look the same.  However the data is really (an example):

3039.0700000000000000001     and     3039.07

Hence a fractional difference.  I have come up against this several times and it can happen in several different ways, most notable if the base programmer uses SAS functions to create a calculated variable, and I use SQL.  Its quite annoying.  What I would suggest is that you identify what the precision needs to be for your compare values and then explicitly round to that precision.  I.e. round(variable_from_base,8.2) = round(variable_from_comp,8.2).

So round them in both datasets then do a compare again.  You could also put both variables into a character field.

Respected Advisor
Posts: 3,156

Re: Need help with proc compare

In regarding to precision issue, it could get messier than just that.

data _NULL_;

     A=5.1;

     B=3.5;

     C1=1.6;

     C2=A-B;

     IF C1=C2 THEN

           PUT C1 30.28 " EQUAL " C2 30.28;

     ELSE PUT C1 30.28 " NOT EQUAL " C2 30.28;

PUT C1 HEX16./C2 HEX16.     ;

RUN;

SAS (like other computer software) is using Hex. format to store the value, Decimal has always been a approximate approach. In this case, only Hex format can tell whether they are equal or not.

Regards,

Haikuo

Edit:

of course, Binary will also do:

PUT C1 BINARY64. /C2 BINARY64.;

Solution
‎06-17-2014 10:41 AM
Super User
Posts: 11,343

Re: Need help with proc compare

Posted in reply to SergioSanchez

If the fact that your variables for some reason differ by a minute amount but will not affect your other uses you can use the FUZZ option on the Proc Compare statement to only report inequalities greater than the value you set.
For example:

proc compare base = a compare=b fuzz= 0.00001

out= result outnoequal outbase outcomp outdif outpercent transpose;

<remainder of compare code here>

will only report an inequaly greater than 0.00001 (or 1 E-5). Given your example a fuzz of 0.00000001 would probably remove most of the inequality messages.

SAS Employee
Posts: 15

Re: Need help with proc compare

Posted in reply to SergioSanchez

Hello SergioSanchez,

You can specify the criterion for judging the equality of numeric values.  See:

PROC COMPARE :: Base SAS(R) 9.4 Procedures Guide, Second Edition

and:

Concepts: COMPARE Procedure :: Base SAS(R) 9.4 Procedures Guide, Second Edition

You might try:

  method=absolute

  criterion=0.0000001

to suppress differences smaller than 10e-7, for example, unless you're not sure if they're significant.

Valued Guide
Posts: 2,177

Re: Need help with proc compare

Posted in reply to SergioSanchez

Think you need BY  and or ID statements to make key-based compare

good luck

peterC

Contributor
Posts: 68

Re: Need help with proc compare

Posted in reply to SergioSanchez

I finally resolved with the options;

fuzz= 0.00001 criterion=-1000

At the moment it´s working fine

Thanks all for your time and posts

Regards

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 284 views
  • 9 likes
  • 6 in conversation