Solved: Re: Survey Data Analysis

Missmichelle · Posted 09-26-2019 11:47 AM

Hello!

I am analyzing a small dataset (N>300) with survey data. There is a section that asked the participants to assign the importance of 20 different health services so "high" "low" or "none". Each service response is stored as its own variable ie: service1 = "High" service2="low" service3= "high". I have already formatted each response to correspond to 1, 2 or 2 instead of the text. My question is what is the most efficient way to display the data? Do I need to transpose is since there are so many variables in my analysis? If so how do I go about it? My end goal is to show a distribution of the responses and to assign a service response score to each individual in the dataset.

Thank You!

Michelle

ballardw · Posted 09-26-2019 04:10 PM

Since we are talking about single digits an alternate approach for counting:

data example;
   input x1 - x10;
   ones = countc(cats(of x:),'1');
   twos = countc(cats(of x:),'2');
   thrs = countc(cats(of x:),'3');
datalines;
1 1 3 2 1 3 2 1 1 1
;

View solution in original post

ballardw · Posted 09-26-2019 12:20 PM

What to do next may depend on what the analysis question(s) you are attempting to answer might be.

First, is this a complex survey design with strata and/or clusters and different sample weights between them? That would mean that likely we would need to use the various survey procs to properly use the sampling information.

Second are there any outcomes associated with all of the scored variables? Are any of the services considered more important? You might need to weight the individual variables in building your composite "service response score"

Several approaches come to mind as possible: Summing the numeric values and then creating histograms of that summed variable would condense things.

Advantage: Easy to code: Score = sum(of service1-service20); and proc sgplot.

Disadvantage: same total could mask notable differences in sub elements.

Do these services have related values? Such as variables related to patient interaction with staff may be grouped separately from actual care services? Likely groups might be created such as with sums and again displayed as histograms or other graph.

Advantage: still easy

Disadvantage: more work on your part identifying the groups

And then there a group of CLUSTERING procedures to let the data show you groups of responses that are similar.

Missmichelle · Posted 09-26-2019 12:54 PM

I guess my confusion is in the calculation part. An individual has a response ranging from 1 to 3 for any of the given services. Instead of the sum, I would like to count how many 1's this individual has, how many 2's, and how many 3's rather than the total sum accross.

Reeza · Posted 09-26-2019 01:59 PM

Look at PROC SURVEYFREQ.

DWilson · Posted 09-26-2019 02:57 PM

@Missmichelle wrote:
Hello!

I am analyzing a small dataset (N>300) with survey data. There is a section that asked the participants to assign the importance of 20 different health services so "high" "low" or "none". Each service response is stored as its own variable ie: service1 = "High" service2="low" service3= "high". I have already formatted each response to correspond to 1, 2 or 2 instead of the text. My question is what is the most efficient way to display the data? Do I need to transpose is since there are so many variables in my analysis? If so how do I go about it? My end goal is to show a distribution of the responses and to assign a service response score to each individual in the dataset.

Thank You!

Michelle

I would create a single variable for each health service. Each respondent would have a single "response" for each health service variable. At the end of this, each respondent would have 20 health service variables capturing their responses.

Once you have that, look at each variable separately with proc freq (or surveyfreq if you have weights and a complex sample design.)

I would then look at the cross-tabulation of all 20 variables and examine the patterns of response. I would look for obvious grouping patterns and report on them.

You could also create an aggregate measure, assuming each of your 20 service items are scored the same way (High, Low, None) and, for each respondent, calculate: # of Highs, # of Lows, and # of Nones. I'd then look at the distribution of # of Highs (surveyfreq with weights) so see if there is an obvious split in the distribution of # of Highs. I'd do the same thing for # of Lows and # of Nones. You could also calculate, for each resondent, proportion of "Highs" and use that to classify respondents. You could also use something like: Proportion of Highs minus proportion of Lows or proportion of Highs minus proportion of nones. I'm not sure if None means not applicable or if they are not concerned at all. If it's the former situation then proportion of Highs minus proportion of nones doesn't really make sense.

More generally, look into the notion of Likert scores to see about how you might combine responses to 20 items to come up with some aggregate score for an individual. (The ones I gave are simplistic but might work for you.)

Oh, once you have the 20 variables for each person; with each variable containing a value of 1, 2, or 3. You can calculate the number of 1s,2s, and 3s for each person in a variety of ways.

Here's one:

data mydata;

retain numones numtwos numthrees;

set mydata;

array values{20} yourvariablename1-yourvariablename20;

do i=1 to 20;

if values{i}=1 then numones=numones+1;

else if values{i}=2 then numtwos=numtwos+1;

else if values{i}=3 then numthrees=numthrees+1;

end;

drop i;

run;

In the array statement just list out the names of your 20 health services variables.

ballardw · Posted 09-26-2019 04:10 PM

Since we are talking about single digits an alternate approach for counting:

data example;
   input x1 - x10;
   ones = countc(cats(of x:),'1');
   twos = countc(cats(of x:),'2');
   thrs = countc(cats(of x:),'3');
datalines;
1 1 3 2 1 3 2 1 1 1
;

Missmichelle · Posted 10-01-2019 10:07 AM

Thank You!!