Hello, I have a data set like this (the tag variables can have up to hundreds different values)
OBS tag
1 A00
2 B90
3 A23
4 A46
5 B19
6 A66
7 B21
8 B32
9 C11
10 C26
...
I'd like to create new variables like this: when tag is between A00 and B90, if tag is equal to A23 or B19 or B21 then my new_var_1 = 1, if tag is equal to the rest then new_var_2 = 2.
something like this:
if A00<=tag<= B90 then do;
if tag in (A23, B19, B21) then do new_var_1 = 1;
else then do new_var_2 =2
Thank you.
Your values of Tag would have to be character. That means that any reference to an explicit value must be in quotes: 'A00', 'B90' etc.
Extreme caution needs to be used with < or > in comparing character values as as 'A1' is "greater than" "A09" or "A0100". Character variables are compared left to right character to character. As soon as one character is different then the comparison will end with an unequal result and the < or > may not make sense. IF, at this might be a big if, every single Tag value consists of a Letter and exactly two digits, and the letters are all upper case then the < or > component will like work as expected.
The code would look more like:
if 'A00'<=tag<= 'B90' then do; if tag in ('A23', 'B19', 'B21') then new_var_1 = 1; elsethennew_var_2 =2; end;
Note the the DO will require an END; somewhere. If there is only a single assignment in a branch then the THEN is all that is needed. The above code will have missing values for new_var_1 then Tag is anything other than the listed variables and missing for new_var_2 when the value shown. IF you wanted to set values for both variables for each branch then the DO would be needed and look like:
if 'A00'<=tag<= 'B90' then do; if tag in ('A23', 'B19', 'B21') then do; new_var_1 = 1; new_var_2=0; end; else do; new_var_1=0; new_var_2 =2; end; end;
Note how the indentation makes it easy to align the END with the associated DO.
I will say that having the two variables looks odd as shown to me but I don't know how you intend to use them latter.
You have to code the DATA step such that tag is a character variable, and so the values of tag must be in quotes. If you don't you will get errors. So, A00 should be 'A00' in your code, and so on.
data want;
set a;
if 'AOO'<=tag<='B90' then do;
if tag in ('A23','B19','B21') then new_var=1;
else new_var=0;
end;
run;
Thank you for your response @PaigeMiller. In my case, I'd like to create two new variables like this:
when tag is between A00 and B90, if tag equals to A23, B19, B21, then new_var=1, the rest new_var_2=2.
Many thanks again.
@altadata1 wrote:
Thank you for your response @PaigeMiller. In my case, I'd like to create two new variables like this:
when tag is between A00 and B90, if tag equals to A23, B19, B21, then new_var=1, the rest new_var_2=2.
Many thanks again.
This is a relatively simple change to the code I provided, and I leave it to you as a homework assignment to make the change.
Since these two conditions are mutually exclusive, I recommend putting both conditions into one single variable, as I did, rather than two variables, unless you have a very good reason why it needs to be in separate variables. If there are two and only two values that a variable can take on, 0 and 1 is recommended and preferred over 1 and 2.
Your values of Tag would have to be character. That means that any reference to an explicit value must be in quotes: 'A00', 'B90' etc.
Extreme caution needs to be used with < or > in comparing character values as as 'A1' is "greater than" "A09" or "A0100". Character variables are compared left to right character to character. As soon as one character is different then the comparison will end with an unequal result and the < or > may not make sense. IF, at this might be a big if, every single Tag value consists of a Letter and exactly two digits, and the letters are all upper case then the < or > component will like work as expected.
The code would look more like:
if 'A00'<=tag<= 'B90' then do; if tag in ('A23', 'B19', 'B21') then new_var_1 = 1; elsethennew_var_2 =2; end;
Note the the DO will require an END; somewhere. If there is only a single assignment in a branch then the THEN is all that is needed. The above code will have missing values for new_var_1 then Tag is anything other than the listed variables and missing for new_var_2 when the value shown. IF you wanted to set values for both variables for each branch then the DO would be needed and look like:
if 'A00'<=tag<= 'B90' then do; if tag in ('A23', 'B19', 'B21') then do; new_var_1 = 1; new_var_2=0; end; else do; new_var_1=0; new_var_2 =2; end; end;
Note how the indentation makes it easy to align the END with the associated DO.
I will say that having the two variables looks odd as shown to me but I don't know how you intend to use them latter.
Thank you so much.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.