BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SergioSanchez
Calcite | Level 5

Hi all

I want to compare two large dasets (over 3 millions of rows) and I am introduced myselft to proc compare.

I tell you what are I Doing to run the code.

First I run a proc sort on both datasets to order the variables. After this I change the name and format of the compare dataset for having the same structure.

and the last step

proc compare base = a compare=b /* MAXPRINT=total*/

out= result outnoequal outbase outcomp outdif outpercent transpose;

var var1................varn;

run;

This is a sample of the result I have

 

_OBS_1=76637 _OBS_2=76637:

Variable Valor base Compare Diff. % Diff

var1 3039.07 3039.07 -4.54747E-13 -1.49634E-14

var2 3039.07 3039.07 -4.54747E-13 -1.49634E-14

and as this sample over 50,000 more

what do I doing wrong?, how can I fix this problem?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If the fact that your variables for some reason differ by a minute amount but will not affect your other uses you can use the FUZZ option on the Proc Compare statement to only report inequalities greater than the value you set.
For example:

proc compare base = a compare=b fuzz= 0.00001

out= result outnoequal outbase outcomp outdif outpercent transpose;

<remainder of compare code here>

will only report an inequaly greater than 0.00001 (or 1 E-5). Given your example a fuzz of 0.00000001 would probably remove most of the inequality messages.

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, what is the question?  The output from proc compare is not always easy to read and so may require some post-processing.  Also, if you are comparing data of that size you may want to consider a bit more what you want to compare, for instance can you group the data, maybe summarise, of do by group joins to identify certain fall outs.  There are other methodologies also, some of the stats procedures will give you counts, outliers.  Distinct lists for codes, defined lists.  Maybe if you have a visit schedule, you could pull that out as a tabulate etc.

SergioSanchez
Calcite | Level 5

I know both variables are the same so I'd like to ajust the proc to obtain equalities in variable like the sample. also I have tested with proc means but I have the same problem plus the outpercent option don't work for me because I don't how to change de format.

What are the other methods I can use to compare two datasets?

base                    25451159.00

compare              25451275.42

diff                       116.42

percent                 0.00

thanks in advanced

RW9
Diamond | Level 26 RW9
Diamond | Level 26

It looks like you are running into the display versus storage issue.  You may see in your datasets:

3039.07     and     3039.07

Which look the same.  However the data is really (an example):

3039.0700000000000000001     and     3039.07

Hence a fractional difference.  I have come up against this several times and it can happen in several different ways, most notable if the base programmer uses SAS functions to create a calculated variable, and I use SQL.  Its quite annoying.  What I would suggest is that you identify what the precision needs to be for your compare values and then explicitly round to that precision.  I.e. round(variable_from_base,8.2) = round(variable_from_comp,8.2).

So round them in both datasets then do a compare again.  You could also put both variables into a character field.

Haikuo
Onyx | Level 15

In regarding to precision issue, it could get messier than just that.

data _NULL_;

     A=5.1;

     B=3.5;

     C1=1.6;

     C2=A-B;

     IF C1=C2 THEN

           PUT C1 30.28 " EQUAL " C2 30.28;

     ELSE PUT C1 30.28 " NOT EQUAL " C2 30.28;

PUT C1 HEX16./C2 HEX16.     ;

RUN;

SAS (like other computer software) is using Hex. format to store the value, Decimal has always been a approximate approach. In this case, only Hex format can tell whether they are equal or not.

Regards,

Haikuo

Edit:

of course, Binary will also do:

PUT C1 BINARY64. /C2 BINARY64.;

ballardw
Super User

If the fact that your variables for some reason differ by a minute amount but will not affect your other uses you can use the FUZZ option on the Proc Compare statement to only report inequalities greater than the value you set.
For example:

proc compare base = a compare=b fuzz= 0.00001

out= result outnoequal outbase outcomp outdif outpercent transpose;

<remainder of compare code here>

will only report an inequaly greater than 0.00001 (or 1 E-5). Given your example a fuzz of 0.00000001 would probably remove most of the inequality messages.

JasonAllen
SAS Employee

Hello SergioSanchez,

You can specify the criterion for judging the equality of numeric values.  See:

PROC COMPARE :: Base SAS(R) 9.4 Procedures Guide, Second Edition

and:

Concepts: COMPARE Procedure :: Base SAS(R) 9.4 Procedures Guide, Second Edition

You might try:

  method=absolute

  criterion=0.0000001

to suppress differences smaller than 10e-7, for example, unless you're not sure if they're significant.

Peter_C
Rhodochrosite | Level 12

Think you need BY  and or ID statements to make key-based compare

good luck

peterC

SergioSanchez
Calcite | Level 5

I finally resolved with the options;

fuzz= 0.00001 criterion=-1000

At the moment it´s working fine

Thanks all for your time and posts

Regards

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1615 views
  • 9 likes
  • 6 in conversation