data want;
set have;
Category=.;
if (var1var2_ratio >= 5) OR (var1 > 0 AND (var2 = . OR var2 = 0)) then Category = 1; *var1 was dominant;
if (var1var2_ratio > 0.2 AND var1var2_ratio <= 2) then Category = 2;*var1 and var2 were balanced;
if (var1var2_ratio <= 0.2) OR (var2 > 0 AND (var2 = . OR var2 = 0)) then Category = 3;*var2 was dominant;
if (var1var2_ratio > 2 AND var1var2_ratio < 5) then Category = 4;*var1 is higher but not dominant;
run;
I have two variables var1 and var2 and I created a var1var2_ratio (var1/var2). Now I want to create a category variable using var1, var2, and var1var2_ratio.
The way I am using OR and AND does not create the outcome I want. I appreciate it if you have any suggestions to fix my code.
Thanks!
What is the outcome you want? Explain the rules in words.
Please show us a small portion of the data and the desired outcomes for a number of cases.
First thing to change is to not have separate IF statements. That can cause trouble when the conditions overlap.
Either link them using ELSE clauses or switch to using SELECT statement instead.
if (...) then category=1;
else if (...) then category=2;
else if (...) then category=3;
else if (...) then category=4;
else put 'Category not defined.';
Read the documentation and its examples.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/p09213s9jc2t99n1vx0omk2rh9ps.htm
@Emma_at_SAS wrote:
I did not know I can use SELECT to derive such variables. Do you recommend any document I can use to learn more about how to derive variables? Thanks!
Select works best when you have a single value or expression to evaluate such as
Select (state); when ('AK') rate= 0.3; when ('AL') rate= 0.01; when ('AR') rate= 0.08; when ('AZ') rate= 1.1; ... otherwise rate= 0.5; end;
which is a bit easier then 50 (or more) IF/ Then/Else. Where there is Rate= there could be an entire block of code defined by a do; statement1; statement2; end; or similar. So it might work for the outer comparisons.
The basic SAS documentation on the data step select statement should be sufficient. NOTE: this not the Proc SQL select which is an entirely different beast.
Start by changing to ELSE IF not just IF.
Then explain your rules and show some examples of where you are not getting expected values.
data want;
set have;
Category=.;
if (var1var2_ratio >= 5) OR (var1 > 0 AND (var2 = . OR var2 = 0)) then Category = 1; *var1 was dominant;
else if (var1var2_ratio > 0.2 AND var1var2_ratio <= 2) then Category = 2;*var1 and var2 were balanced;
else if (var1var2_ratio <= 0.2) OR (var2 > 0 AND (var2 = . OR var2 = 0)) then Category = 3;*var2 was dominant;
else if (var1var2_ratio > 2 AND var1var2_ratio < 5) then Category = 4;*var1 is higher but not dominant;
run;
@Emma_at_SAS wrote:
data want; set have; Category=.; if (var1var2_ratio >= 5) OR (var1 > 0 AND (var2 = . OR var2 = 0)) then Category = 1; *var1 was dominant; if (var1var2_ratio > 0.2 AND var1var2_ratio <= 2) then Category = 2;*var1 and var2 were balanced; if (var1var2_ratio <= 0.2) OR (var2 > 0 AND (var2 = . OR var2 = 0)) then Category = 3;*var2 was dominant; if (var1var2_ratio > 2 AND var1var2_ratio < 5) then Category = 4;*var1 is higher but not dominant; run;
I have two variables var1 and var2 and I created a var1var2_ratio (var1/var2). Now I want to create a category variable using var1, var2, and var1var2_ratio.
The way I am using OR and AND does not create the outcome I want. I appreciate it if you have any suggestions to fix my code.
Thanks!
Thank you very much @PaigeMiller @Reeza @Tom
Yes, I had overlapping conditions when I missed adding (var1var2_ratio > 0) in the var2 dominant category. You solved my problem and I learned I should not use multiple IF's to create derived variables.
Thanks!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.