Combining Binary Variables to One Categorical Variable

epi_user · Posted 05-29-2019 11:43 AM

A research assistant has a data set with 200+ survey question variables that is set up like this, Where "A1" is the question number.

subject ModA1_na ModA1_no ModA1_yes ModA2_na ModA2_no ModA2_yes

1 1 0 0 0 0 1

2 0 1 0 1 0 0

3 0 0 1 1 0 0

4 0 1 0 0 0 1

She wants to combine the questions so that there is one variable for each question with 0,1,2 as options for NA, NO , YES.

So for example:

Take “ModA1_na”, “ModA1_no”, “ModA1_yes” and combine to form one variable called "ModA1" with three categories. Do you know of a way to do this without writing a step for each new variable? She has over 60 questions with 3 variables each created for each question.

Reeza · Posted 05-29-2019 12:18 PM

1. Transpose the data to a long format using PROC TRANSPOSE.

2. Use the SCAN functions to separate the question (MODA1) from the response (NA/NO/YES).

3. Transpose to a different wide structure, keeping only records that have a 1, not a zero, with the values being the response and the ID being the Question from part 2.

If you need further help please post the code you tried.

@epi_user wrote:

A research assistant has a data set with 200+ survey question variables that is set up like this, Where "A1" is the question number.

subject ModA1_na ModA1_no ModA1_yes ModA2_na ModA2_no ModA2_yes

1     1 0 0    0        0        1

2 0 1 0 1 0        0

3 0 0 1 1 0        0

4 0 1 0 0 0        1

She wants to combine the questions so that there is one variable for each question with 0,1,2 as options for NA, NO , YES.

So for example:

Take “ModA1_na”, “ModA1_no”, “ModA1_yes” and combine to form one variable called "ModA1" with three categories. Do you know of a way to do this without writing a step for each new variable? She has over 60 questions with 3 variables each created for each question.

KachiM · Posted 05-29-2019 01:04 PM

Possibly Arrays may simplify. If you have triplets like ModA1_na, ModA1_no, and ModA1_yes for all your Questions then we may use Arrays.

data have;
input subject ModA1_na ModA1_no ModA1_yes ModA2_na ModA2_no ModA2_yes;
datalines;
1  1  0  0  0  0  1
2  0  1  0  1  0  0
3  0  0  1  1  0  0
4  0  1  0  0  0  1
;
run;
data want;
   set have;
   array k ModA1_na ModA1_no ModA1_yes ModA2_na ModA2_no ModA2_yes;
   array m ModA1 ModA2;
   do i = 1 to dim(k);
      j = 1 + int((i-1)/3);  /* Given i, j is forced to take 1 or 2 */
      jj = mod(i-1,3);       /* Given i, jj is forced to take 0,1 or 2 */
      if k[i] then m[j] = jj;
   end;
keep subject ModA1 ModA2;
run;
proc print data = want;
run;

The OUTPUT:

Obs 	subject 	ModA1 	ModA2
1 	1 	0 	2
2 	2 	1 	0
3 	3 	2 	0
4 	4 	1 	2

The comments give you how each element of array K[ ] gives the INDEX J to Group Number and INDEX JJ to Value.

Hope it helps. Any query on the use array can be answered.

DataSP

Astounding · Posted 05-29-2019 01:43 PM

I would think it's fairly easy to code all 60 sets of variables. For example, you might begin with:

ModA1 = ModA1_no + 2 * ModA1_yes;
ModA2 = ModA2_no + 2 * ModA2_yes;

Just replicate these lines, and change the question number ... one line per set of questions.

There are other ways ... more complex yet more flexible. But I don't think you need to start using a complex approach when a simple one will do.

ballardw · Posted 05-29-2019 02:28 PM

Another option, if available, is to go back to the software that collected the data and examine output options. I have worked with software used for survey data collection that defaulted to that sort of output for categorical/ select one answer type of questions. BUT it had an option to report the only selected choice as well. You just had to know enough to look.

If you have multiple response questions you likely seriously want to examine options as some of the defaults I have seen for those questions can be seriously problematic for some sorts of analysis.

Ksharp · Posted 05-30-2019 09:01 AM

Double PROC TRANSPOSE.

data have;
input subject ModA1_na ModA1_no ModA1_yes ModA2_na ModA2_no ModA2_yes;
datalines;
1  1  0  0  0  0  1
2  0  1  0  1  0  0
3  0  0  1  1  0  0
4  0  1  0  0  0  1
;
run;
proc transpose data=have out=temp(where=(col1=1));
by subject;
run;
data temp1;
 set temp;
 name=scan(_name_,1,'_');
 value=scan(_name_,-1,'_');
run;
proc transpose data=temp1 out=want;
by subject;
id name;
var value;
run;

Combining Binary Variables to One Categorical Variable

Re: Combining Binary Variables to One Categorical Variable

Re: Combining Binary Variables to One Categorical Variable

Re: Combining Binary Variables to One Categorical Variable

Re: Combining Binary Variables to One Categorical Variable

Re: Combining Binary Variables to One Categorical Variable

Combining Binary Variables to One Categorical Variable

Re: Combining Binary Variables to One Categorical Variable

Re: Combining Binary Variables to One Categorical Variable

Re: Combining Binary Variables to One Categorical Variable

Re: Combining Binary Variables to One Categorical Variable

Re: Combining Binary Variables to One Categorical Variable

SAS Innovate 2025: Register Now