comparing values in two different format

Reply
Contributor LRN
Contributor
Posts: 57

comparing values in two different format

hi i have two columns

col1     col2

----     -------

29.75     0.30

14.85     0.14

76.15     0.76
43.65     0.43

22.55     0.23

i have millions of rows like this. when i tried to create 2 diff datasets one that matches and the other one non matching values, i am not getting the correct datas in non-matching. non matching has both matching as well as non-matching. How can i get the corrct datas?

Thank you

Lakshmi

Super User
Posts: 10,497

Re: comparing values in two different format

What code are using to determine if they match? And provide some examples of the match/nomatch returning incorrect results.

From you example data I'm guessing that you have rounded and not rounded values and possible comparing percents to decimal near-equivalents. If that is the case you'll probable need some more comprehensive rules other than equal/ not equal to meet your matching criteria.

Super Contributor
Posts: 644

Re: comparing values in two different format

At a guess it looks like your col1 represents a percentage (not just amounts with a percentage format).  Hence to find the matches you need to multiply col2 by 100 (or divide col1 by 100 - but for rounding reasons multiply col1).

You may find that there are still rounding issues in your data so you may also need to round before comparing or accept a difference of <.01 as denoting equal.

In its simplest form

     data equal (where  (col1 = 100 * col2))

          unequal (where (col1 <> 100 * col2))

          ;

          Set have ;

     run ;

I would suggest a more robust version to test the differences

     data equal (where  (test = 0))

          unequal (where (test > 0))

          ;

          Set have ;

          test = ABS (col1 - 100 * col2) ;

     run ;

Then test the unequal differences ;

     Proc univariate

          data = unequal ;

          var test ;

     run ;

This should give you enough information to set a minimum value for test for which you will accept inequality.

(ie change the condition test = 0 to test > .05 etc)

Richard

Ask a Question
Discussion stats
  • 2 replies
  • 166 views
  • 2 likes
  • 3 in conversation