DATA Step, Macro, Functions and more

Multiple response variables and dummy coding

Accepted Solution Solved
Reply
Contributor
Posts: 33
Accepted Solution

Multiple response variables and dummy coding

Does SAS have any PROC or other features that can handle variables that store multiple responses by reformatting that variable via dummy coding?  

 

Let me provide an example. The user can input multiple responses for the following variable to indicate they have had more than one type of plastic surgery:

 

1. Type of plastic surgery:

     a. option 

     b. option 

     c. option

     d. option

     e. option

 

Of course, multiple values in one cell is not something you can do an analysis on, so I was wondering if SAS has a feature that separates them whether that be doing something like 0s or 1s for whether or not someone selected that type or surgery:

 

type_plastic_surg_1       type_plastic_surg_2     type_plastic_surg_3      type_plastic_surg_4

             0                                         1                               1                                      0

 

OR (even better than creating multiple variables in my opinion), something like dummy coding so that all responses have a unique binary code, such as:

 

1. Type of plastic surgery:                                                               1. type_plastic_surg

     a. option                                                                                          00000

     b. option                                                                                          00001

     c. option                                                                                          00010

     d. option                                                                                          etc....

     e. option

     f. if option a and option b are selected                                           10001

     g. if option a, b, c are selected                                                       10101                  and so on and so forth...

 

A google search returns PROC TABULATE as a possible solution to my problem, but all literature I have read thus far on this procedure doesn't seem to solve my problem. So any advice is appreciated.

Thank you!


Accepted Solutions
Solution
4 weeks ago
Contributor
Posts: 33

Re: Multiple response variables and dummy coding

[ Edited ]

Under the scenario that I have a multiple selection variable that will end up becoming multiple variables when imported into SAS, I have come up with a solution to my problem. 

The code below first starts with what the data would look like if imported into SAS, where the variable "v" originally was just one "v" variable but offered the user the option to select more than one choice. After the users have selected more than one option in variable "v", the data is exported from REDcap and imported into SAS, which REDcap then spreads "v" out into "v1", "v2", "v3", etc, and codes everything as 0s or 1s based on whether or not an option was selected. This makes for a very wide dataset and you I can't do the PROC FREQ I need to do on 30 different variables. I need ONE table with ONE row variable and ONE column variable.

**v1-v5 is a variable where users can input all the different kinds; 
**of plastic surgery they have had. 0s for unselected, 1s for selected.; DATA have; INPUT v1 v2 v3 v4 v5 InsuranceStatus; DATALINES; 0 1 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 1 0 1 1 0 0 1 1 0 1 ; RUN; **Using CAT to concatenate the spread out "v" variable into one;
**variable called "result". However, CAT only returns a character;
**variable and I need it to be numeric. The code below converts;
**"result" to Z5 numeric format in a new variable called "result2"; DATA want; set have; format result2 Z5.; result = cat(of v1-v5); put result $char.; result2 = result; RUN; **A bunch of 0s and 1s is not very readable for humans. Labeling ;
**them below ensures I understand what they mean; PROC FORMAT library = work; value ResultLabel 01011 = "Diep procedure" 00000 = "none" 11011 = "DIEP + tissue" 11111 = "All selected" 10001 = "tissue + flap" 01100 = "nipple + flap" 11101 = "something else" 00110 = "lots of surgeries"; run; /*The PROC FREQ below gives me the one table that I want */ PROC FREQ DATA = want; TITLE "Types of plastic surgeries"; FORMAT result2 ResultLabel.; TABLES result2 * InsuranceStatus / nopercent; RUN;

 

View solution in original post


All Replies
Super User
Posts: 6,754

Re: Multiple response variables and dummy coding

It's not difficult to produce your "ideal" scenario.  However, you have shown us what the questions look like.  You need to show what the data would look like, before it can be transformed into the data you would like.

Super User
Super User
Posts: 9,599

Re: Multiple response variables and dummy coding

 

Post  test data in the form of a datastep!!

 

As such I can only guess at what you have/want so:

data want;
  length new_surg $200;
  surg="a. Option1,b. Option2,c. Option3";
  do i=1 to countw(surg,",");
     new_surg=scan(surg,i,",");
     output;
  end;
run;

You will see the single string is split out into the number of elements, each on a new row - rows are the best way of storing this data (fixed structure, expandable collapse rows etc.).  You can code this data, and you can transpose it if needed - though apart from a report, I would program with the long structure.

Super User
Posts: 13,508

Re: Multiple response variables and dummy coding

Show what your current data actually looks like. Having worked with a number of data systems that collected multiple response data and provided output in a, sometimes quite odd, number of output formats this is really important.

 

In fact you may want to investigate options in your collection software as it may have an option to export data in such a format.

One piece of software I have used had options for multiple response data that would output

01101 as indicators that choices 2,3 and 5 were selected

or 5 variables which could be

First response selected, Second response selected, Third response selected (in order of selection)

or 5 variables each a binary selected/not selected

or 5 variables with text of the selected values with the non-selected items having a blank or missing value.

 

So you selected which approach you wanted for your export with options to export similar format questions separately.

Contributor
Posts: 33

Re: Multiple response variables and dummy coding

Ballard, 

 

The reason I did not post how the data was stored is because I can store it however I want. 

 

However, I just ran into a conundrum making my problem even more difficult to answer, so now I am going to have to solve the storage problem before I can solve the problem of reformatting the data. =(

 

Anyway that I can delete this post? Haha!  Or maybe just come back to it later...

Super User
Posts: 13,508

Re: Multiple response variables and dummy coding


@lady8506 wrote:

Ballard, 

 

The reason I did not post how the data was stored is because I can store it however I want. 

 

However, I just ran into a conundrum making my problem even more difficult to answer, so now I am going to have to solve the storage problem before I can solve the problem of reformatting the data. =(

 

Anyway that I can delete this post? Haha!  Or maybe just come back to it later...


Please leave the thread in place, others may find the discussion helpful.

 

A separate thread later if the basic question changes significantly would be better.

I would guess that you might have issues reading the data into a SAS data set which is a not uncommon issue dealing with some data sources. Best would be to provide some example text file and what structure you need the SAS data set to look like. Dummy data is fine as long as it provides a realistic example of how your source data appears.

Contributor
Posts: 33

Re: Multiple response variables and dummy coding

At this point, it may be best for me to store the data in REDCap which allows for multiple response variables, but when imported into SAS, REDcap will spread those variables out into "selected" and "unselected" choices (e.g. type_plastic_surg becomes type_plastic_surg_1, type_plastic_surg_2, type_plastic_surg_3, etc). This creates a very wide dataset, which I didn't want, but it may be what I have to work with at the moment.

 

I can probably then figure out a way to concatenate all those variables into one. Maybe something like this?

 

DATA Temp1; SET Redcap;
length binary 3;
IF missing(type_of_plastic_surg__0-type_of_plastic_surg__2) then binary = .;
ELSE binary = strip(catt(of type_of_plastic_surg__0-type_of_plastic_surg__2));
run;

Currently, this code is a smaller version of the 20+ variables it needs to be run on and it doesn't work. Do you see any errors in it? 

Super User
Posts: 13,508

Re: Multiple response variables and dummy coding


@lady8506 wrote:

At this point, it may be best for me to store the data in REDCap which allows for multiple response variables, but when imported into SAS, REDcap will spread those variables out into "selected" and "unselected" choices (e.g. type_plastic_surg becomes type_plastic_surg_1, type_plastic_surg_2, type_plastic_surg_3, etc). This creates a very wide dataset, which I didn't want, but it may be what I have to work with at the moment.

 

I can probably then figure out a way to concatenate all those variables into one. Maybe something like this?

 

DATA Temp1; SET Redcap;
length binary 3;
IF missing(type_of_plastic_surg__0-type_of_plastic_surg__2) then binary = .;
ELSE binary = strip(catt(of type_of_plastic_surg__0-type_of_plastic_surg__2));
run;

Currently, this code is a smaller version of the 20+ variables it needs to be run on and it doesn't work. Do you see any errors in it? 


Examples of what you start with and the desired output really help to discuss if example code might do what you want.

Also the content and variables in your set should reflect the intended analysis. if you need to consider all of the options as a single "class" for example then a concatenation may be appropriate. But you might also have questions related to specific options for a question such as response differences to one or more other questions between respondents/patients/clients that had or did not have option 2 to question 5.

 

Market analysis does this all of the time where you have a question like "which brand of corn flakes have you purchased". And then look for characteristics of respondents to see if purchases of Brand X have different characteristics than purchasers of Brand Y.

 

I can even see potential analysis based on the number of options selected for a specific question.

So the variables you have in the data should reflect the planned analysis. And be prepared to add or combine variables at different steps. You might find that every response with option 1 to a question may also have the same option 5 value. At which point having 5 levels may not be needed.

Contributor
Posts: 33

Re: Multiple response variables and dummy coding

[ Edited ]

I understand what you are saying and I do try to plan the storage according to what's needed in the analysis, but for the purposes of this variable, "Type of Plastic Surgery", the type of analysis done on it is not the issue. The issue is creating a database that is not only user friendly and doesn't have a ton of options someone has to scroll through because they can only make one selection:

 

Type of Plastic Surgery:

1. option A

2. option B

3. option C

4. option D

5. option A and B

6. option A and C

7. option A and D

8. option A, B, and C

etc, etc....

 

And at the same time, creating something that I can use for analysis. That's why I liked the idea of creating a variable that allows for multiple selections, but stores those selections as a binary code of 0s and 1s, which would make it ideal for analysis if the need were ever to arise. I just wondered if SAS had a feature like that, but seems that things got more complicated than answering that one simple question. 

Here is some sample data:

data have;
   input type_ps_1 type_ps_2 type_ps_3;
   datalines;
0 1 1
1 0 1
0 0 0
0 0 1
. . .
1 1 1 ;

  Would like it to look like:

type_ps_combined
011
101
000
001
.
111;
Solution
4 weeks ago
Contributor
Posts: 33

Re: Multiple response variables and dummy coding

[ Edited ]

Under the scenario that I have a multiple selection variable that will end up becoming multiple variables when imported into SAS, I have come up with a solution to my problem. 

The code below first starts with what the data would look like if imported into SAS, where the variable "v" originally was just one "v" variable but offered the user the option to select more than one choice. After the users have selected more than one option in variable "v", the data is exported from REDcap and imported into SAS, which REDcap then spreads "v" out into "v1", "v2", "v3", etc, and codes everything as 0s or 1s based on whether or not an option was selected. This makes for a very wide dataset and you I can't do the PROC FREQ I need to do on 30 different variables. I need ONE table with ONE row variable and ONE column variable.

**v1-v5 is a variable where users can input all the different kinds; 
**of plastic surgery they have had. 0s for unselected, 1s for selected.; DATA have; INPUT v1 v2 v3 v4 v5 InsuranceStatus; DATALINES; 0 1 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 1 0 1 1 0 0 1 1 0 1 ; RUN; **Using CAT to concatenate the spread out "v" variable into one;
**variable called "result". However, CAT only returns a character;
**variable and I need it to be numeric. The code below converts;
**"result" to Z5 numeric format in a new variable called "result2"; DATA want; set have; format result2 Z5.; result = cat(of v1-v5); put result $char.; result2 = result; RUN; **A bunch of 0s and 1s is not very readable for humans. Labeling ;
**them below ensures I understand what they mean; PROC FORMAT library = work; value ResultLabel 01011 = "Diep procedure" 00000 = "none" 11011 = "DIEP + tissue" 11111 = "All selected" 10001 = "tissue + flap" 01100 = "nipple + flap" 11101 = "something else" 00110 = "lots of surgeries"; run; /*The PROC FREQ below gives me the one table that I want */ PROC FREQ DATA = want; TITLE "Types of plastic surgeries"; FORMAT result2 ResultLabel.; TABLES result2 * InsuranceStatus / nopercent; RUN;

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 159 views
  • 2 likes
  • 4 in conversation