Hello,
I am trying to create a new variable by combining the response options from 9 other variables. The response options range from 1 to 16 on each of the 9 other variables. I would like all the response options currently coded as 1 to remain 1 and the other response options (2-16) on all of the 9 separate variables to all be coded as 0. Essentially, I am trying to code respondents who said they have ate broccoli during the last 30 days compared to the other 15 response options, so the resulting variable would be broccoli consumption vs. any other vegetable. While the original question on the questionnaire is one question asking respondents to identify which types of vegetables they have consumed (so respondents could enter multiple responses if they ate more than one vegetable type), the dataset contains 9 separate variables that correspond to this question, each with response options that contain the different vegetable types (1-16). Below is a picture of the raw data of the first 2 variables, so you can see how each of the 9 variables are structured.
All I want to do is combine all the 1 responses across the 9 separate variables and then combine the responses 2-16 across the 9 separate variables. With the code below, the 1 responses are all correctly combined, but the problem is that the code is not capturing all the 2-16 response options across the 9 separate variables, it is only capturing the 2-16 response options for variable 8_7a in the first part of the code [else if 8_7a in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) and ignoring the rest of the following or statements, so I am not capturing the responses 2-16 on the rest of the other 8 variables. Why would this be occurring?
Here is my code that I have tried:
broccoli=.;
if 8_7A=1 OR 8_7B=1 OR 8_7C=1 OR 8_7d=1 or 8_7e=1 or 8_7f=1 or 8_7g=1 or 8_7h=1 or 8_7i=1 then broccoli=1;
else if 8_7a IN (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7B in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7c in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7d in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7e in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7f in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7g in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7h in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7i in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) then broccoli=0;
else broccoli=.;
Any help is appreciated!
@jag07g wrote:
I am trying to create a new variable by combining the response options from 9 other variables. The response options range from 1 to 16 on each of the 9 other variables. I would like all the response options currently coded as 1 to remain 1 and the other response options (2-16) on all of the 9 separate variables to all be coded as 0.
Use formats. Most SAS PROCs will use the formats in the analysis.
proc format;
value zero_one 1='1' 2-16='0';
run;
or better yet
proc format;
value yes_no 1='Yes' 2='No';
run;
and then (for example)
proc means data=whatever;
var var1-var9; /* your nine variable names go here */
format var1-var9 yes_no.;
run;
By the way, your example has variable names that begin with a digit, obviously that's not valid SAS code, so it's not clear why you are showing us that. Better to show us actual working code...
@jag07g wrote:
Hello,
I am trying to create a new variable by combining the response options from 9 other variables. The response options range from 1 to 16 on each of the 9 other variables. I would like all the response options currently coded as 1 to remain 1 and the other response options (2-16) on all of the 9 separate variables to all be coded as 0. Essentially, I am trying to code respondents who said they have ate broccoli during the last 30 days compared to the other 15 response options, so the resulting variable would be broccoli consumption vs. any other vegetable. While the original question on the questionnaire is one question asking respondents to identify which types of vegetables they have consumed (so respondents could enter multiple responses if they ate more than one vegetable type), the dataset contains 9 separate variables that correspond to this question, each with response options that contain the different vegetable types (1-16). Below is a picture of the raw data of the first 2 variables, so you can see how each of the 9 variables are structured.
All I want to do is combine all the 1 responses across the 9 separate variables and then combine the responses 2-16 across the 9 separate variables. With the code below, the 1 responses are all correctly combined, but the problem is that the code is not capturing all the 2-16 response options across the 9 separate variables, it is only capturing the 2-16 response options for variable 8_7a in the first part of the code [else if 8_7a in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) and ignoring the rest of the following or statements, so I am not capturing the responses 2-16 on the rest of the other 8 variables. Why would this be occurring?
Here is my code that I have tried:
broccoli=.; if 8_7A=1 OR 8_7B=1 OR 8_7C=1 OR 8_7d=1 or 8_7e=1 or 8_7f=1 or 8_7g=1 or 8_7h=1 or 8_7i=1 then broccoli=1; else if 8_7a IN (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7B in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7c in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7d in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7e in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7f in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7g in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7h in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or 8_7i in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) then broccoli=0; else broccoli=.;
Any help is appreciated!
Show us log from running that code. I expect to se a lot of errors because I don't believe your are referencing your variables correctly or those are not the names of the variables. You may have a label of 8_7A but a variable with that name would have to be referenced as '8_7A'n.
Another approach would be the Whichn function.
data example; input x1 - x5; array x x1-x5; found1 = whichn(1, of x(*)); datalines; 1 2 3 4 5 2 2 2 2 2 2 3 4 5 1 1 1 1 1 1 ;
The Whichn function will return the position the specific value searched for, the first parameter, is first found in a list of values. Using an array of variable values that would the variable index in the array or 0 if not found in any of the values
To get a true false you could use something like
broccoli = (which(1,of x(*) ) > 0);
Which will leave a 0 instead of missing for broccoli. You may want to consider that instead of the missing as you can take a mean of broccoli to get the percent of responses that had the code.
Sorry, the F was cut off in front of my variable name. The code below runs with no errors and creates a broccoli variable that has the correct number of broccoli responses to be coded as 1, but I am concerned because it is not capturing all the respondents that should be captured in the else if command. I don't know why this is happening because I have used the or statement to create other variables with no problems.
broccoli=.;
if F8_7A=1 OR F8_7B=1 OR F8_7C=1 OR F8_7d=1 or f8_7e=1 or f8_7f=1 or f8_7g=1 or f8_7h=1 or f8_7i=1 then broccoli=1;
else if f8_7a IN (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or F8_7B in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7c in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7d in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7e in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7f in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7g in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7h in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7i in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) then broccoli=0;
else broccoli=.;
Here is the resulting broccoli variable I get with the above code. It correctly assigns 338 respondents to broccoli, but it seems there should be more than 728 in the 0 category, given I have over 1300 responses of 2-16 in the 9 variables F8_7A-F8_7I. I am unfamiliar with arrays, but wondering if that would work to solve the problem. I just do not know why the other responses are not being captured.
@jag07g use formats as I described above.
@jag07g wrote:
Sorry, the F was cut off in front of my variable name. The code below runs with no errors and creates a broccoli variable that has the correct number of broccoli responses to be coded as 1, but I am concerned because it is not capturing all the respondents that should be captured in the else if command. I don't know why this is happening because I have used the or statement to create other variables with no problems.
broccoli=.; if F8_7A=1 OR F8_7B=1 OR F8_7C=1 OR F8_7d=1 or f8_7e=1 or f8_7f=1 or f8_7g=1 or f8_7h=1 or f8_7i=1 then broccoli=1; else if f8_7a IN (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or F8_7B in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7c in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7d in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7e in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7f in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7g in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7h in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) or f8_7i in (2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) then broccoli=0; else broccoli=.;
Here is the resulting broccoli variable I get with the above code. It correctly assigns 338 respondents to broccoli, but it seems there should be more than 728 in the 0 category, given I have over 1300 responses of 2-16 in the 9 variables F8_7A-F8_7I. I am unfamiliar with arrays, but wondering if that would work to solve the problem. I just do not know why the other responses are not being captured.
There is no way that we can diagnosis what is apparently a data issue without data.
Do you have any MISSING values for the variables? Or non-integer values?
If all of F8_7A - F8_7i are missing or non-integer then all of the or comparison are going to return false and not assign any value of 0 because the IN operator would need the missing value to consider..
See if:
broccoli = (whichn(1, of f8_7:) >0) ;
matches your expectations.
BTW you can indicate sequential values of integers for the IN operator by using a colon to separate the low and high values:
f8_7a IN (2:16)
which would at least make your code easier to read.
For future coding you might consider naming variables as F8_7_1 or F8_7_01 to F8_7_16. Then you easier use list short hand such as F8_7_3 - F8_7_10 if want to assign that subset of the variables to an array or other functions that accept variable lists. The suffix of letters makes that a bit harder with SAS code.
It is strange. The whichn function did not seem to solve the issue. There are no missing or non-integer values on any of the 9 variables. I am wondering if it has something to do with the fact that the original one question allowed for multiple choice options and that is the reason for the elimination of some responses in the code. I am only guessing because I do not have any documentation or codebook for the structuring of the original 9 variables.
Why do you think some values are missed?
Do you have more than 728+338 observations? Are you sure they weren't eliminated by something else in whatever code you used to generate the output pictured?
Or do you want to count some observations as both 0 and 1? So that total of the number of 1's and number of 0's is more than the total number of observations?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.