I have a data set (roughly 20 million observations) see below of what table looks like.
data have;
infile datalines dlm="09"x;
input ID HK_SCR BK_SCR;
datalines;
123 1.021 1.019
124 0.355 0.355
125 0.781 0.771
126 0.599 0.599
127 2.541 2.513
run;
I'm currently trying to find a way to compare differences at a certain placement of the numeric value in HK_SCR & BK_SCR. When examining the HK_SCR and BK_SCR variables at the maximum view (24.8) the differences are minuscule. The code provided will not replicate what I have stored but it gives you a general idea.
I'd like to try to find differences at the third decimal point. For example, ID=124 id like to categorize these as = to each other so that they are excluded from my join where HK_SCR <> BK_SCR. Or if there is an If then statement I can utilize to accomplish this? Also not sure of the impacts of the round function
"round(var,.001)" and how that will impact my SCR Variables. I've included my two codes that do not successfully accomplish this.
PROC SQL;
CREATE TABLE DIF_SCR AS
SELECT ID,
HK_SCR,
BK_SCR
FROM HAVE
WHERE HK_SCR <> BK_SCR
group by ID;
QUIT;
data DIF_1;
set HAVE;
if HK_SCR <> BK_SCR then FLAG=1; ELSE FLAG=0;
RUN;
PROC SQL;
CREATE TABLE DIF_SCR AS
SELECT *
FROM HAVE
WHERE HK_SCR - BK_SCR not between -0.001 and 0.001;
QUIT;
You can try using the ROUND function, which will allow you to detect differences in the first N decimals places.
Your example on ID 124 is different in the second decimal place, so its not clear why you think these are equal according to your statement "I'd like to try to find differences at the third decimal point."
@PaigeMiller I took a random sample where values were the same and tested the Round function. It did not work unfortunately.
@bknitch wrote:
@PaigeMiller I took a random sample where values were the same and tested the Round function. It did not work unfortunately.
It's impossible to respond to this, without knowing what you did and what data you used.
DATA TEST1ROUND;
set COMPARE;
HK_SCR1=round(HK_SCR,0.001);
BK_SCR1=round(BK_SCR,0.001);
run;
data HAVE(Drop=MY_ID);
set test1round;
ID=MY_ID;
ID=1234;
if HK_SCR1 <> BK_SCR1 then FLAG=1; ELSE FLAG=0;
run;
Here is my output, i can't show the ID variables due to privacy so i just renamed them to 1234. These three are all different ID's.
ID | HK_SCR | BK_SCR | HK_SCR1 | BK_SCR1 | FLAG |
1234 | 0.339 | 0.339 | 0.339 | 0.339 | 1 |
1234 | 3.865 | 3.859 | 3.865 | 3.859 | 1 |
1234 | 0.565 | 0.565 | 0.565 | 0.565 | 1 |
Hi @bknitch,
In SAS the "<>" operator has different meanings: In WHERE conditions and PROC SQL it may be interpreted as "not equal to", but in most other places (e.g. IF conditions as in your example) it is interpreted as the maximum operator*. So, you need to use an (unambiguous) "not equal to" operator such as ne or ~=.
* see log message
NOTE: The "<>" operator is interpreted as "MAX".
PROC SQL;
CREATE TABLE DIF_SCR AS
SELECT *
FROM HAVE
WHERE HK_SCR - BK_SCR not between -0.001 and 0.001;
QUIT;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.