08-05-2013 04:25 PM
hi i have two columns
i have millions of rows like this. when i tried to create 2 diff datasets one that matches and the other one non matching values, i am not getting the correct datas in non-matching. non matching has both matching as well as non-matching. How can i get the corrct datas?
08-05-2013 04:42 PM
What code are using to determine if they match? And provide some examples of the match/nomatch returning incorrect results.
From you example data I'm guessing that you have rounded and not rounded values and possible comparing percents to decimal near-equivalents. If that is the case you'll probable need some more comprehensive rules other than equal/ not equal to meet your matching criteria.
08-06-2013 02:05 AM
At a guess it looks like your col1 represents a percentage (not just amounts with a percentage format). Hence to find the matches you need to multiply col2 by 100 (or divide col1 by 100 - but for rounding reasons multiply col1).
You may find that there are still rounding issues in your data so you may also need to round before comparing or accept a difference of <.01 as denoting equal.
In its simplest form
data equal (where (col1 = 100 * col2))
unequal (where (col1 <> 100 * col2))
Set have ;
I would suggest a more robust version to test the differences
data equal (where (test = 0))
unequal (where (test > 0))
Set have ;
test = ABS (col1 - 100 * col2) ;
Then test the unequal differences ;
data = unequal ;
var test ;
This should give you enough information to set a minimum value for test for which you will accept inequality.
(ie change the condition test = 0 to test > .05 etc)