hi i have two columns
col1 col2
---- -------
29.75 0.30
14.85 0.14
76.15 0.76
43.65 0.43
22.55 0.23
i have millions of rows like this. when i tried to create 2 diff datasets one that matches and the other one non matching values, i am not getting the correct datas in non-matching. non matching has both matching as well as non-matching. How can i get the corrct datas?
Thank you
Lakshmi
What code are using to determine if they match? And provide some examples of the match/nomatch returning incorrect results.
From you example data I'm guessing that you have rounded and not rounded values and possible comparing percents to decimal near-equivalents. If that is the case you'll probable need some more comprehensive rules other than equal/ not equal to meet your matching criteria.
At a guess it looks like your col1 represents a percentage (not just amounts with a percentage format). Hence to find the matches you need to multiply col2 by 100 (or divide col1 by 100 - but for rounding reasons multiply col1).
You may find that there are still rounding issues in your data so you may also need to round before comparing or accept a difference of <.01 as denoting equal.
In its simplest form
data equal (where (col1 = 100 * col2))
unequal (where (col1 <> 100 * col2))
;
Set have ;
run ;
I would suggest a more robust version to test the differences
data equal (where (test = 0))
unequal (where (test > 0))
;
Set have ;
test = ABS (col1 - 100 * col2) ;
run ;
Then test the unequal differences ;
Proc univariate
data = unequal ;
var test ;
run ;
This should give you enough information to set a minimum value for test for which you will accept inequality.
(ie change the condition test = 0 to test > .05 etc)
Richard
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.