Hello!
I am analyzing a small dataset (N>300) with survey data. There is a section that asked the participants to assign the importance of 20 different health services so "high" "low" or "none". Each service response is stored as its own variable ie: service1 = "High" service2="low" service3= "high". I have already formatted each response to correspond to 1, 2 or 2 instead of the text. My question is what is the most efficient way to display the data? Do I need to transpose is since there are so many variables in my analysis? If so how do I go about it? My end goal is to show a distribution of the responses and to assign a service response score to each individual in the dataset.
Thank You!
Michelle
Since we are talking about single digits an alternate approach for counting:
data example; input x1 - x10; ones = countc(cats(of x:),'1'); twos = countc(cats(of x:),'2'); thrs = countc(cats(of x:),'3'); datalines; 1 1 3 2 1 3 2 1 1 1 ;
What to do next may depend on what the analysis question(s) you are attempting to answer might be.
First, is this a complex survey design with strata and/or clusters and different sample weights between them? That would mean that likely we would need to use the various survey procs to properly use the sampling information.
Second are there any outcomes associated with all of the scored variables? Are any of the services considered more important? You might need to weight the individual variables in building your composite "service response score"
Several approaches come to mind as possible: Summing the numeric values and then creating histograms of that summed variable would condense things.
Advantage: Easy to code: Score = sum(of service1-service20); and proc sgplot.
Disadvantage: same total could mask notable differences in sub elements.
Do these services have related values? Such as variables related to patient interaction with staff may be grouped separately from actual care services? Likely groups might be created such as with sums and again displayed as histograms or other graph.
Advantage: still easy
Disadvantage: more work on your part identifying the groups
And then there a group of CLUSTERING procedures to let the data show you groups of responses that are similar.
I guess my confusion is in the calculation part. An individual has a response ranging from 1 to 3 for any of the given services. Instead of the sum, I would like to count how many 1's this individual has, how many 2's, and how many 3's rather than the total sum accross.
@Missmichelle wrote:Hello!
I am analyzing a small dataset (N>300) with survey data. There is a section that asked the participants to assign the importance of 20 different health services so "high" "low" or "none". Each service response is stored as its own variable ie: service1 = "High" service2="low" service3= "high". I have already formatted each response to correspond to 1, 2 or 2 instead of the text. My question is what is the most efficient way to display the data? Do I need to transpose is since there are so many variables in my analysis? If so how do I go about it? My end goal is to show a distribution of the responses and to assign a service response score to each individual in the dataset.
Thank You!
Michelle
I would create a single variable for each health service. Each respondent would have a single "response" for each health service variable. At the end of this, each respondent would have 20 health service variables capturing their responses.
Once you have that, look at each variable separately with proc freq (or surveyfreq if you have weights and a complex sample design.)
I would then look at the cross-tabulation of all 20 variables and examine the patterns of response. I would look for obvious grouping patterns and report on them.
You could also create an aggregate measure, assuming each of your 20 service items are scored the same way (High, Low, None) and, for each respondent, calculate: # of Highs, # of Lows, and # of Nones. I'd then look at the distribution of # of Highs (surveyfreq with weights) so see if there is an obvious split in the distribution of # of Highs. I'd do the same thing for # of Lows and # of Nones. You could also calculate, for each resondent, proportion of "Highs" and use that to classify respondents. You could also use something like: Proportion of Highs minus proportion of Lows or proportion of Highs minus proportion of nones. I'm not sure if None means not applicable or if they are not concerned at all. If it's the former situation then proportion of Highs minus proportion of nones doesn't really make sense.
More generally, look into the notion of Likert scores to see about how you might combine responses to 20 items to come up with some aggregate score for an individual. (The ones I gave are simplistic but might work for you.)
Oh, once you have the 20 variables for each person; with each variable containing a value of 1, 2, or 3. You can calculate the number of 1s,2s, and 3s for each person in a variety of ways.
Here's one:
data mydata;
retain numones numtwos numthrees;
set mydata;
array values{20} yourvariablename1-yourvariablename20;
do i=1 to 20;
if values{i}=1 then numones=numones+1;
else if values{i}=2 then numtwos=numtwos+1;
else if values{i}=3 then numthrees=numthrees+1;
end;
drop i;
run;
In the array statement just list out the names of your 20 health services variables.
Since we are talking about single digits an alternate approach for counting:
data example; input x1 - x10; ones = countc(cats(of x:),'1'); twos = countc(cats(of x:),'2'); thrs = countc(cats(of x:),'3'); datalines; 1 1 3 2 1 3 2 1 1 1 ;
Thank You!!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.