BookmarkSubscribeRSS Feed
epi_user
Calcite | Level 5

A research assistant has a data set with 200+ survey question variables that is set up like this, Where "A1" is the question number. 

 

subject   ModA1_na      ModA1_no   ModA1_yes   ModA2_na    ModA2_no      ModA2_yes

1                  1                   0                0                   0                   0                           1

2                  0                   1                0                   1                   0                           0

3                  0                    0               1                   1                   0                           0

4                  0                   1                0                   0                   0                           1

 

She wants to combine the questions so that there is one variable for each question with 0,1,2 as options for NA, NO , YES.

So for example:

Take “ModA1_na”, “ModA1_no”, “ModA1_yes” and combine to form one variable called "ModA1" with three categories. Do you know of a way to do this without writing a step for each new variable? She has over 60 questions with 3 variables each created for each question.  

5 REPLIES 5
Reeza
Super User

1. Transpose the data to a long format using PROC TRANSPOSE. 

2. Use the SCAN functions to separate the question (MODA1) from the response (NA/NO/YES).

3. Transpose to a different wide structure, keeping only records that have a 1, not a zero, with the values being the response and the ID being the Question from part 2.

 

If you need further help please post the code you tried.

 


@epi_user wrote:

A research assistant has a data set with 200+ survey question variables that is set up like this, Where "A1" is the question number. 

 

subject   ModA1_na      ModA1_no   ModA1_yes   ModA2_na    ModA2_no      ModA2_yes

1                  1                   0                0                   0                   0                           1

2                  0                   1                0                   1                   0                           0

3                  0                    0               1                   1                   0                           0

4                  0                   1                0                   0                   0                           1

 

She wants to combine the questions so that there is one variable for each question with 0,1,2 as options for NA, NO , YES.

So for example:

Take “ModA1_na”, “ModA1_no”, “ModA1_yes” and combine to form one variable called "ModA1" with three categories. Do you know of a way to do this without writing a step for each new variable? She has over 60 questions with 3 variables each created for each question.  


 

KachiM
Rhodochrosite | Level 12

Possibly Arrays may simplify. If you have triplets like ModA1_na, ModA1_no, and  ModA1_yes for all your Questions then we may use Arrays. 

data have;
input subject ModA1_na ModA1_no ModA1_yes ModA2_na ModA2_no ModA2_yes;
datalines;
1  1  0  0  0  0  1
2  0  1  0  1  0  0
3  0  0  1  1  0  0
4  0  1  0  0  0  1
;
run;
data want;
   set have;
   array k ModA1_na ModA1_no ModA1_yes ModA2_na ModA2_no ModA2_yes;
   array m ModA1 ModA2;
   do i = 1 to dim(k);
      j = 1 + int((i-1)/3);  /* Given i, j is forced to take 1 or 2 */
      jj = mod(i-1,3);       /* Given i, jj is forced to take 0,1 or 2 */
      if k[i] then m[j] = jj;
   end;
keep subject ModA1 ModA2;
run;
proc print data = want;
run;

The OUTPUT:

Obs 	subject 	ModA1 	ModA2
1 	1 	0 	2
2 	2 	1 	0
3 	3 	2 	0
4 	4 	1 	2

The comments give you how each element of array K[ ] gives the INDEX J to Group Number and INDEX JJ to Value.

 

Hope it helps. Any query on the use array can be answered.

 

DataSP

Astounding
PROC Star

I would think it's fairly easy to code all 60 sets of variables.  For example, you might begin with:

 

ModA1 = ModA1_no + 2 * ModA1_yes;
ModA2 = ModA2_no + 2 * ModA2_yes;

Just replicate these lines, and change the question number ... one line per set of questions.

 

There are other ways ... more complex yet more flexible.  But I don't think you need to start using a complex approach when a simple one will do.

 

 

ballardw
Super User

Another option, if available, is to go back to the software that collected the data and examine output options. I have worked with software used for survey data collection that defaulted to that sort of output for categorical/ select one answer type of questions. BUT it had an option to report the only selected choice as well. You just had to know enough to look.

 

If you have multiple response questions you likely seriously want to examine options as some of the defaults I have seen for those questions can be seriously problematic for some sorts of analysis.

Ksharp
Super User

Double PROC TRANSPOSE.

 

data have;
input subject ModA1_na ModA1_no ModA1_yes ModA2_na ModA2_no ModA2_yes;
datalines;
1  1  0  0  0  0  1
2  0  1  0  1  0  0
3  0  0  1  1  0  0
4  0  1  0  0  0  1
;
run;
proc transpose data=have out=temp(where=(col1=1));
by subject;
run;
data temp1;
 set temp;
 name=scan(_name_,1,'_');
 value=scan(_name_,-1,'_');
run;
proc transpose data=temp1 out=want;
by subject;
id name;
var value;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3128 views
  • 5 likes
  • 6 in conversation