BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
altadata1
Obsidian | Level 7

Hello, I have a data set like this (the tag variables can have up to hundreds different values)

 

OBS        tag

1                A00

2               B90

3              A23

4              A46

5              B19

6              A66

7              B21

8              B32

9              C11

10           C26

 

...            

I'd like to create new variables like this: when tag is between A00 and B90, if tag is equal to A23 or B19 or B21 then my new_var_1 = 1, if tag is equal to the rest   then new_var_2 = 2.  

 

something like this:

if A00<=tag<= B90 then do; 

if tag in (A23, B19, B21) then do new_var_1 = 1; 

else then do new_var_2 =2

 

Thank you. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Your values of Tag would have to be character. That means that any reference to an explicit value must be in quotes: 'A00', 'B90' etc.

Extreme caution needs to be used with < or > in comparing character values as as 'A1' is "greater than" "A09" or "A0100". Character variables are compared left to right character to character. As soon as one character is different then the comparison will end with an unequal result and the < or > may not make sense. IF, at this might be a big if, every single Tag value consists of a Letter and exactly two digits, and the letters are all upper case then the < or > component will like work as expected.

The code would look more like:

if 'A00'<=tag<= 'B90' then do; 
   if tag in ('A23', 'B19', 'B21') then new_var_1 = 1; 
   else then new_var_2 =2;
end;

Note the the DO will require an END; somewhere. If there is only a single assignment in a branch then the THEN is all that is needed. The above code will have missing values for new_var_1 then Tag is anything other than the listed variables and missing for new_var_2 when the value shown. IF you wanted to set values for both variables for each branch then the DO would be needed and look like:

if 'A00'<=tag<= 'B90' then do; 
   if tag in ('A23', 'B19', 'B21') then do;
      new_var_1 = 1; 
      new_var_2=0;
   end;
   else do;
      new_var_1=0;
      new_var_2 =2;
   end;
end;

Note how the indentation makes it easy to align the END with the associated DO.

 

I will say that having the two variables looks odd as shown to me but I don't know how you intend to use them latter.

 

 

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

You have to code the DATA step such that tag is a character variable, and so the values of tag must be in quotes. If you don't you will get errors. So, A00 should be 'A00' in your code, and so on.

 

data want;
    set a;
    if 'AOO'<=tag<='B90' then do;
        if tag in ('A23','B19','B21') then new_var=1;
        else new_var=0;
    end;
run;
--
Paige Miller
altadata1
Obsidian | Level 7

Thank you for your response @PaigeMiller. In my case, I'd like to create two new variables like this: 

when tag is between A00 and B90, if tag equals to A23, B19, B21, then new_var=1, the rest new_var_2=2. 

Many thanks again. 

PaigeMiller
Diamond | Level 26

@altadata1 wrote:

Thank you for your response @PaigeMiller. In my case, I'd like to create two new variables like this: 

when tag is between A00 and B90, if tag equals to A23, B19, B21, then new_var=1, the rest new_var_2=2. 

Many thanks again. 


This is a relatively simple change to the code I provided, and I leave it to you as a homework assignment to make the change.

 

Since these two conditions are mutually exclusive, I recommend putting both conditions into one single variable, as I did, rather than two variables, unless you have a very good reason why it needs to be in separate variables. If there are two and only two values that a variable can take on, 0 and 1 is recommended and preferred over 1 and 2.

--
Paige Miller
ballardw
Super User

Your values of Tag would have to be character. That means that any reference to an explicit value must be in quotes: 'A00', 'B90' etc.

Extreme caution needs to be used with < or > in comparing character values as as 'A1' is "greater than" "A09" or "A0100". Character variables are compared left to right character to character. As soon as one character is different then the comparison will end with an unequal result and the < or > may not make sense. IF, at this might be a big if, every single Tag value consists of a Letter and exactly two digits, and the letters are all upper case then the < or > component will like work as expected.

The code would look more like:

if 'A00'<=tag<= 'B90' then do; 
   if tag in ('A23', 'B19', 'B21') then new_var_1 = 1; 
   else then new_var_2 =2;
end;

Note the the DO will require an END; somewhere. If there is only a single assignment in a branch then the THEN is all that is needed. The above code will have missing values for new_var_1 then Tag is anything other than the listed variables and missing for new_var_2 when the value shown. IF you wanted to set values for both variables for each branch then the DO would be needed and look like:

if 'A00'<=tag<= 'B90' then do; 
   if tag in ('A23', 'B19', 'B21') then do;
      new_var_1 = 1; 
      new_var_2=0;
   end;
   else do;
      new_var_1=0;
      new_var_2 =2;
   end;
end;

Note how the indentation makes it easy to align the END with the associated DO.

 

I will say that having the two variables looks odd as shown to me but I don't know how you intend to use them latter.

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1443 views
  • 1 like
  • 3 in conversation