I need to categorise a variable based on certain values. I want, however, that it is done sequentially and that the last value trumps the previous "if" statements.
data hhdata;
set hhdata;
if '5.1b'n = 0 or '5.1b'n = 1 then FoodSecurity = "Food Secure";
If (('5.1b'n in (2 3)) or ('5.2b'n in (1 2 3)) or ('5.3b'n = 1
or '5.4b'n = 1)) then FoodSecurity = "Mildly Food Insecure Access";
If '5.3b'n = 2 or '5.3b'n = 3 or '5.4b'n = 2 or '5.4b'n = 3
or '5.5b'n = 1 or '5.5b'n = 2 or '5.6b'n = 1 or '5.6b'n = 2 then FoodSecurity = "Moderate Food Insecure Access";
If (cat(of '5.7b'n--'5.9b'n) in (1 2 3)) then FoodSecurity = "Severe Food Insecure Access";
else if '5.5b'n = 3 or '5.6b'n = 3 then FoodSecurity = "Severe Food Insecure Access";
if '5.1b'n in ('-' ' ') or cat(of '5.2b'n--'5.9b'n) in ('-' ' ')
then FoodSecurity = "Missing";
run;
Lines 3, 10, and 12 should be "Severe"
Lines 3, 10, and 12 should be "Severe"
Why? What is the logic that makes you say Line 3 should be severe? What is the logic that makes you say Line 10 should be severe?
"Sequential" in your context is going to mean set your conditions in order to you of processing and then use IF/THEN/Else.
The Else gets processed when the first If is not true. Often it helps to provide a default before the tests to get the result for when none of your conditions are true.
Maybe:
data hhdatanew; set hhdata; length FoodSecurity $ 30; if '5.1b'n = 0 or '5.1b'n = 1 then FoodSecurity = "Food Secure"; ELSE If (('5.1b'n in (2 3)) or ('5.2b'n in (1 2 3)) or ('5.3b'n = 1 or '5.4b'n = 1)) then FoodSecurity = "Mildly Food Insecure Access"; ELSE If '5.3b'n = 2 or '5.3b'n = 3 or '5.4b'n = 2 or '5.4b'n = 3 or '5.5b'n = 1 or '5.5b'n = 2 or '5.6b'n = 1 or '5.6b'n = 2 then FoodSecurity = "Moderate Food Insecure Access"; ELSE If (cat(of '5.7b'n--'5.9b'n) in (1 2 3)) then FoodSecurity = "Severe Food Insecure Access"; else if '5.5b'n = 3 or '5.6b'n = 3 then FoodSecurity = "Severe Food Insecure Access"; if '5.1b'n in ('-' ' ') or cat(of '5.2b'n--'5.9b'n) in ('-' ' ') then FoodSecurity = "Missing"; run;
Note: your current code was going to limit the length of the FoodSecurity variable to 11 characters because the first use assigning the value as "Food Secure" set that as the longest length it would hold. Set the length of the variable prior to use so that you can hold the longer values such as "Mildly Food Insecure Access"
Not even going to attempt to code against a picture of data.
Caution: Use of the coding construct of the same data set as source (SET statement) and result (DATA statement) while syntactically correct is extremely dangerous when it comes to recoding variables. One minor logic or spelling error and you can remove or change variables that you did not intend to and you have permanently corrupted the dataset and will have to go back to previous steps to recover the data.
I would also suggest not using non-standard variable names unless you really love constantly using the ' 'n.
You should post your sample data as a data step, e.g. like this:
data hhdata;
infile cards missover;
length '5.1b'n '5.2b'n '5.3b'n '5.4b'n '5.5b'n '5.6'n '5.7b'n '5.8b'n '5.9b'n $1;
input '5.1b'n '5.2b'n '5.3b'n '5.4b'n '5.5b'n '5.6'n '5.7b'n '5.8b'n '5.9b'n;
cards;
0 0 0 1 1 0 0 0 0
3 1 3 0 0 0 1 0 0
2 3 3 1 2 3 1 1 0
2 3 3 1 2 3 1 1 0
3 3 3 0 3 3 2 0 0
2 0 1 1 3 3 2 0 0
0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 1 1 0
3 3 3 0 3 3 2 0 0
1 1 0 1 1 1 1 0 0
;run;
The reason you are not getting the right answers in your code is the use of the CAT function, e.g. for obs. 3:
If (cat(of '5.7b'n--'5.9b'n) in (1 2 3)) then FoodSecurity = "Severe Food Insecure Access";
The value returned from the CAT function is the string '100' which is none of the values '1', '2' or '3' (SAS automatically converts numbers to strings).
Instead, you can use the FINDC function to see if any of the values are present in the concatenation:
if findc(cat(of '5.7b'n--'5.9b'n),'123') then FoodSecurity = "Severe Food Insecure Access";
I would also rewrite the program to put the cases that take precedence first, using IF...ELSE (more efficient, CPU-wise, and easier to understand when reading the program). So my suggestion for what your program should look like is something like this:
data want;
set hhdata;
length FoodSecurity $50;
if findc(cat(of '5.1b'n--'5.9b'n),'- ') then FoodSecurity = "Missing";
else if findc(cat(of '5.7b'n--'5.9b'n),'123')
or '5.5b'n = '3' or '5.6b'n = '3' then FoodSecurity = "Severe Food Insecure Access";
else if findc(cat(of '5.3b'n--'5.4b'n),'23')
or findc(cat(of '5.5b'n--'5.6b'n),'12') then FoodSecurity = "Moderate Food Insecure Access";
else if '5.1b'n in ('2' '3') or '5.2b'n in ('1' '2' '3') or '5.3b'n = 1 or '5.4b'n = '1' then FoodSecurity = "Mildly Food Insecure Access";
else if '5.1b'n in('1','0') then FoodSecurity = "Food Secure";
else FoodSecurity='Out of bounds';
run;
I also put a LENGTH statement on the FoodSecurity variable, as it otherwise gets truncated to the length of the first value encountered in the program.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.