BookmarkSubscribeRSS Feed
confooseddesi89
Quartz | Level 8

I am working with a SAS dataset that has several variables querying time spent using media each day. Each variable queries a different type of media use, with the same answer options. An example question is, "how much time did you spend today talking on your phone?" The answer options are:

 

a) < 30 Min
b) 30 Min - 1 Hr
c) 1 - 2 Hrs
d) 2 - 3 Hrs
e) 3+ Hrs

 

There are 7 questions in total.

 

I would like to create a composite of these 7 ordinal variables. My idea was to take the midpoint of each answer and create an average. For example, if someone answered A, D, E, B, C, A, then D, that would translate to 0.25 hours, 2.5 hours, 3 hours, 0.75 hours, 1.5 hours, 0.25 hours, and 2.5 hours, with the average being 1.54 hours spent on media use per day.

 

Is this considered an "appropriate" way to combine the variables? If not, is there a more appropriate way?

 

Thank you.

5 REPLIES 5
ballardw
Super User

@confooseddesi89 wrote:

I am working with a SAS dataset that has several variables querying time spent using media each day. Each variable queries a different type of media use, with the same answer options. An example question is, "how much time did you spend today talking on your phone?" The answer options are:

 

a) < 30 Min
b) 30 Min - 1 Hr
c) 1 - 2 Hrs
d) 2 - 3 Hrs
e) 3+ Hrs

 

There are 7 questions in total.

 

I would like to create a composite of these 7 ordinal variables. My idea was to take the midpoint of each answer and create an average. For example, if someone answered A, D, E, B, C, A, then D, that would translate to 0.25 hours, 2.5 hours, 3 hours, 0.75 hours, 1.5 hours, 0.25 hours, and 2.5 hours, with the average being 1.54 hours spent on media use per day.

 

Is this considered an "appropriate" way to combine the variables? If not, is there a more appropriate way?

 

Thank you.


Personally I might question the use of the minimum value when the longest interval is chosen. If someone responds "3+ hours" using 3 hours would be the minimum. It may depend on what further analysis you intend with this numeric range after conversion. How to treat that range might depend on how you intend to use/interpret results later.

 

You do not provide any example of what you mean by combining variables or how you might use that combined value. So can't answer if this an appropriate way to combine them.

confooseddesi89
Quartz | Level 8

By combine, I mean (from the recoded variables) create a composite, using the sum across variables to represent the total time spent across the day using media.

ballardw
Super User

Have you tested the distribution of your SUMmed variable? If you get values more than 24 hours for a day then you have an issue. Actually I don't expect that too much but you only list one category so I might be surprised.

The question is do you get any totals like 8 or 10? Some of your serious TV viewers or Internet trolls could well rack that up and if your aren't seeing any values that extreme then your use of the minimum value for the longest duration is suspect.

 


@confooseddesi89 wrote:

By combine, I mean (from the recoded variables) create a composite, using the sum across variables to represent the total time spent across the day using media.


 

confooseddesi89
Quartz | Level 8

Below is the frequency distribution for the sum of the school variables:

confooseddesi89_0-1680635112559.png

 

Below is the frequency distribution for the sum of the weekend variables:

confooseddesi89_1-1680635201955.png

 

As you can see, the max is not beyond 24, but around 21, which is still way too long when considering sleep duration. I think this is either due to overestimation or multi-tasking (using multiple devices at once) and makes me think creating a sum is not appropriate in this case. Is there any other potential way to combine the variables that may be appropriate? Thanks.

ballardw
Super User

First a general comment about disaggregation (going from a group or range of values to a specific) : as you are finding out, not easy. Often it is better to ask for an estimate of duration and then if your respondents have a hard time offer ranges for a best fit. AFTER they have had problems with the initial question. And the response should be in a different variable.

 

I might be tempted to look at your data as a stacked bar per respondent totaling the time. This would mean transposing your data so that the values are all in the same variable and you have a variable to indicate which measure is involved.

An example using the the SASHELP.CLASS data set if your aren't familiar with transposing:

Proc transpose data=sashelp.class out=trans;
   by name;
   var height weight;
run;

proc sgplot data=trans;
   vbar name /response=col1 group=_name_;
run;

I would hope that you have a unique respondent ID for the collected values. You likely need to sort your data to use the BY statement in transpose. Your ID would replace the name and height and weight would be replaced by the numeric time you have assigned for the categories.

You would look for individuals with unusual appearance in the charts. Likely those totals of 20+ hours will be of interest to see if the responses may sense in any form.

Sometimes you just have to throw out respondents as their responses are unreliable. Or categorize individuals into groups based on these totals with a caution that the 21+ hours group may not be reliable.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 725 views
  • 0 likes
  • 2 in conversation