BookmarkSubscribeRSS Feed
LRN
Calcite | Level 5 LRN
Calcite | Level 5

hi i have two columns

col1     col2

----     -------

29.75     0.30

14.85     0.14

76.15     0.76
43.65     0.43

22.55     0.23

i have millions of rows like this. when i tried to create 2 diff datasets one that matches and the other one non matching values, i am not getting the correct datas in non-matching. non matching has both matching as well as non-matching. How can i get the corrct datas?

Thank you

Lakshmi

2 REPLIES 2
ballardw
Super User

What code are using to determine if they match? And provide some examples of the match/nomatch returning incorrect results.

From you example data I'm guessing that you have rounded and not rounded values and possible comparing percents to decimal near-equivalents. If that is the case you'll probable need some more comprehensive rules other than equal/ not equal to meet your matching criteria.

RichardinOz
Quartz | Level 8

At a guess it looks like your col1 represents a percentage (not just amounts with a percentage format).  Hence to find the matches you need to multiply col2 by 100 (or divide col1 by 100 - but for rounding reasons multiply col1).

You may find that there are still rounding issues in your data so you may also need to round before comparing or accept a difference of <.01 as denoting equal.

In its simplest form

     data equal (where  (col1 = 100 * col2))

          unequal (where (col1 <> 100 * col2))

          ;

          Set have ;

     run ;

I would suggest a more robust version to test the differences

     data equal (where  (test = 0))

          unequal (where (test > 0))

          ;

          Set have ;

          test = ABS (col1 - 100 * col2) ;

     run ;

Then test the unequal differences ;

     Proc univariate

          data = unequal ;

          var test ;

     run ;

This should give you enough information to set a minimum value for test for which you will accept inequality.

(ie change the condition test = 0 to test > .05 etc)

Richard

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 851 views
  • 2 likes
  • 3 in conversation