BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lady8506
Quartz | Level 8

Does SAS have any PROC or other features that can handle variables that store multiple responses by reformatting that variable via dummy coding?  

 

Let me provide an example. The user can input multiple responses for the following variable to indicate they have had more than one type of plastic surgery:

 

1. Type of plastic surgery:

     a. option 

     b. option 

     c. option

     d. option

     e. option

 

Of course, multiple values in one cell is not something you can do an analysis on, so I was wondering if SAS has a feature that separates them whether that be doing something like 0s or 1s for whether or not someone selected that type or surgery:

 

type_plastic_surg_1       type_plastic_surg_2     type_plastic_surg_3      type_plastic_surg_4

             0                                         1                               1                                      0

 

OR (even better than creating multiple variables in my opinion), something like dummy coding so that all responses have a unique binary code, such as:

 

1. Type of plastic surgery:                                                               1. type_plastic_surg

     a. option                                                                                          00000

     b. option                                                                                          00001

     c. option                                                                                          00010

     d. option                                                                                          etc....

     e. option

     f. if option a and option b are selected                                           10001

     g. if option a, b, c are selected                                                       10101                  and so on and so forth...

 

A google search returns PROC TABULATE as a possible solution to my problem, but all literature I have read thus far on this procedure doesn't seem to solve my problem. So any advice is appreciated.

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
lady8506
Quartz | Level 8

Under the scenario that I have a multiple selection variable that will end up becoming multiple variables when imported into SAS, I have come up with a solution to my problem. 

The code below first starts with what the data would look like if imported into SAS, where the variable "v" originally was just one "v" variable but offered the user the option to select more than one choice. After the users have selected more than one option in variable "v", the data is exported from REDcap and imported into SAS, which REDcap then spreads "v" out into "v1", "v2", "v3", etc, and codes everything as 0s or 1s based on whether or not an option was selected. This makes for a very wide dataset and you I can't do the PROC FREQ I need to do on 30 different variables. I need ONE table with ONE row variable and ONE column variable.

**v1-v5 is a variable where users can input all the different kinds; 
**of plastic surgery they have had. 0s for unselected, 1s for selected.; DATA have; INPUT v1 v2 v3 v4 v5 InsuranceStatus; DATALINES; 0 1 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 1 0 1 1 0 0 1 1 0 1 ; RUN; **Using CAT to concatenate the spread out "v" variable into one;
**variable called "result". However, CAT only returns a character;
**variable and I need it to be numeric. The code below converts;
**"result" to Z5 numeric format in a new variable called "result2"; DATA want; set have; format result2 Z5.; result = cat(of v1-v5); put result $char.; result2 = result; RUN; **A bunch of 0s and 1s is not very readable for humans. Labeling ;
**them below ensures I understand what they mean; PROC FORMAT library = work; value ResultLabel 01011 = "Diep procedure" 00000 = "none" 11011 = "DIEP + tissue" 11111 = "All selected" 10001 = "tissue + flap" 01100 = "nipple + flap" 11101 = "something else" 00110 = "lots of surgeries"; run; /*The PROC FREQ below gives me the one table that I want */ PROC FREQ DATA = want; TITLE "Types of plastic surgeries"; FORMAT result2 ResultLabel.; TABLES result2 * InsuranceStatus / nopercent; RUN;

 

View solution in original post

9 REPLIES 9
Astounding
PROC Star

It's not difficult to produce your "ideal" scenario.  However, you have shown us what the questions look like.  You need to show what the data would look like, before it can be transformed into the data you would like.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

 

Post  test data in the form of a datastep!!

 

As such I can only guess at what you have/want so:

data want;
  length new_surg $200;
  surg="a. Option1,b. Option2,c. Option3";
  do i=1 to countw(surg,",");
     new_surg=scan(surg,i,",");
     output;
  end;
run;

You will see the single string is split out into the number of elements, each on a new row - rows are the best way of storing this data (fixed structure, expandable collapse rows etc.).  You can code this data, and you can transpose it if needed - though apart from a report, I would program with the long structure.

ballardw
Super User

Show what your current data actually looks like. Having worked with a number of data systems that collected multiple response data and provided output in a, sometimes quite odd, number of output formats this is really important.

 

In fact you may want to investigate options in your collection software as it may have an option to export data in such a format.

One piece of software I have used had options for multiple response data that would output

01101 as indicators that choices 2,3 and 5 were selected

or 5 variables which could be

First response selected, Second response selected, Third response selected (in order of selection)

or 5 variables each a binary selected/not selected

or 5 variables with text of the selected values with the non-selected items having a blank or missing value.

 

So you selected which approach you wanted for your export with options to export similar format questions separately.

lady8506
Quartz | Level 8

Ballard, 

 

The reason I did not post how the data was stored is because I can store it however I want. 

 

However, I just ran into a conundrum making my problem even more difficult to answer, so now I am going to have to solve the storage problem before I can solve the problem of reformatting the data. =(

 

Anyway that I can delete this post? Haha!  Or maybe just come back to it later...

ballardw
Super User

@lady8506 wrote:

Ballard, 

 

The reason I did not post how the data was stored is because I can store it however I want. 

 

However, I just ran into a conundrum making my problem even more difficult to answer, so now I am going to have to solve the storage problem before I can solve the problem of reformatting the data. =(

 

Anyway that I can delete this post? Haha!  Or maybe just come back to it later...


Please leave the thread in place, others may find the discussion helpful.

 

A separate thread later if the basic question changes significantly would be better.

I would guess that you might have issues reading the data into a SAS data set which is a not uncommon issue dealing with some data sources. Best would be to provide some example text file and what structure you need the SAS data set to look like. Dummy data is fine as long as it provides a realistic example of how your source data appears.

lady8506
Quartz | Level 8

At this point, it may be best for me to store the data in REDCap which allows for multiple response variables, but when imported into SAS, REDcap will spread those variables out into "selected" and "unselected" choices (e.g. type_plastic_surg becomes type_plastic_surg_1, type_plastic_surg_2, type_plastic_surg_3, etc). This creates a very wide dataset, which I didn't want, but it may be what I have to work with at the moment.

 

I can probably then figure out a way to concatenate all those variables into one. Maybe something like this?

 

DATA Temp1; SET Redcap;
length binary 3;
IF missing(type_of_plastic_surg__0-type_of_plastic_surg__2) then binary = .;
ELSE binary = strip(catt(of type_of_plastic_surg__0-type_of_plastic_surg__2));
run;

Currently, this code is a smaller version of the 20+ variables it needs to be run on and it doesn't work. Do you see any errors in it? 

ballardw
Super User

@lady8506 wrote:

At this point, it may be best for me to store the data in REDCap which allows for multiple response variables, but when imported into SAS, REDcap will spread those variables out into "selected" and "unselected" choices (e.g. type_plastic_surg becomes type_plastic_surg_1, type_plastic_surg_2, type_plastic_surg_3, etc). This creates a very wide dataset, which I didn't want, but it may be what I have to work with at the moment.

 

I can probably then figure out a way to concatenate all those variables into one. Maybe something like this?

 

DATA Temp1; SET Redcap;
length binary 3;
IF missing(type_of_plastic_surg__0-type_of_plastic_surg__2) then binary = .;
ELSE binary = strip(catt(of type_of_plastic_surg__0-type_of_plastic_surg__2));
run;

Currently, this code is a smaller version of the 20+ variables it needs to be run on and it doesn't work. Do you see any errors in it? 


Examples of what you start with and the desired output really help to discuss if example code might do what you want.

Also the content and variables in your set should reflect the intended analysis. if you need to consider all of the options as a single "class" for example then a concatenation may be appropriate. But you might also have questions related to specific options for a question such as response differences to one or more other questions between respondents/patients/clients that had or did not have option 2 to question 5.

 

Market analysis does this all of the time where you have a question like "which brand of corn flakes have you purchased". And then look for characteristics of respondents to see if purchases of Brand X have different characteristics than purchasers of Brand Y.

 

I can even see potential analysis based on the number of options selected for a specific question.

So the variables you have in the data should reflect the planned analysis. And be prepared to add or combine variables at different steps. You might find that every response with option 1 to a question may also have the same option 5 value. At which point having 5 levels may not be needed.

lady8506
Quartz | Level 8

I understand what you are saying and I do try to plan the storage according to what's needed in the analysis, but for the purposes of this variable, "Type of Plastic Surgery", the type of analysis done on it is not the issue. The issue is creating a database that is not only user friendly and doesn't have a ton of options someone has to scroll through because they can only make one selection:

 

Type of Plastic Surgery:

1. option A

2. option B

3. option C

4. option D

5. option A and B

6. option A and C

7. option A and D

8. option A, B, and C

etc, etc....

 

And at the same time, creating something that I can use for analysis. That's why I liked the idea of creating a variable that allows for multiple selections, but stores those selections as a binary code of 0s and 1s, which would make it ideal for analysis if the need were ever to arise. I just wondered if SAS had a feature like that, but seems that things got more complicated than answering that one simple question. 

Here is some sample data:

data have;
   input type_ps_1 type_ps_2 type_ps_3;
   datalines;
0 1 1
1 0 1
0 0 0
0 0 1
. . .
1 1 1 ;

  Would like it to look like:

type_ps_combined
011
101
000
001
.
111;
lady8506
Quartz | Level 8

Under the scenario that I have a multiple selection variable that will end up becoming multiple variables when imported into SAS, I have come up with a solution to my problem. 

The code below first starts with what the data would look like if imported into SAS, where the variable "v" originally was just one "v" variable but offered the user the option to select more than one choice. After the users have selected more than one option in variable "v", the data is exported from REDcap and imported into SAS, which REDcap then spreads "v" out into "v1", "v2", "v3", etc, and codes everything as 0s or 1s based on whether or not an option was selected. This makes for a very wide dataset and you I can't do the PROC FREQ I need to do on 30 different variables. I need ONE table with ONE row variable and ONE column variable.

**v1-v5 is a variable where users can input all the different kinds; 
**of plastic surgery they have had. 0s for unselected, 1s for selected.; DATA have; INPUT v1 v2 v3 v4 v5 InsuranceStatus; DATALINES; 0 1 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 1 0 1 1 0 0 1 1 0 1 ; RUN; **Using CAT to concatenate the spread out "v" variable into one;
**variable called "result". However, CAT only returns a character;
**variable and I need it to be numeric. The code below converts;
**"result" to Z5 numeric format in a new variable called "result2"; DATA want; set have; format result2 Z5.; result = cat(of v1-v5); put result $char.; result2 = result; RUN; **A bunch of 0s and 1s is not very readable for humans. Labeling ;
**them below ensures I understand what they mean; PROC FORMAT library = work; value ResultLabel 01011 = "Diep procedure" 00000 = "none" 11011 = "DIEP + tissue" 11111 = "All selected" 10001 = "tissue + flap" 01100 = "nipple + flap" 11101 = "something else" 00110 = "lots of surgeries"; run; /*The PROC FREQ below gives me the one table that I want */ PROC FREQ DATA = want; TITLE "Types of plastic surgeries"; FORMAT result2 ResultLabel.; TABLES result2 * InsuranceStatus / nopercent; RUN;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 3505 views
  • 2 likes
  • 4 in conversation