BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
research318
Calcite | Level 5
Hello, everyone. I am working on analyzing data from a pre and post test questionnaire. There are 4 Likert scales made up of 8-10 Likert items each, and each Likert item has 5 options ranging from "Very unlikely" to "Very likely" or something similar. Each item will be given a numeric value of 1 to 5. I have read that I should find a composite score (sum or average) for each scale. I want to then compare the pre and post tests, perhaps using a t-test or Wilcoxon signed rank test.
 
How do I find a sum or average of my variables? I think I can maybe use an array and sum across each of the scales, but I'm not sure. Sorry--still learning when it comes to SAS. Here is an example of my data: 
 
 data WORK.IMDUMMY;
   infile datalines dsd truncover;
 input ID:BEST12. Gender:$6. Age:BEST12. School_Background:$7. Time_Spelling:BEST12. Competence1:BEST12. Competence2:BEST12. Competence3:BEST12. Competence4:BEST12. Competence5:BEST12. Competence6:BEST12. Competence7:BEST12. Competence8:BEST12. Competence9:
 BEST12. Competence10:BEST12. Confidence1:BEST12. Confidence2:BEST12. Confidence3:BEST12. Confidence4:BEST12. Confidence5:BEST12. Confidence6:BEST12. Confidence7:BEST12. Confidence8:BEST12. Confidence9:BEST12. Confidence10:BEST12. Resilience1:BEST12. Resili
 ence2:BEST12. Resilience3:BEST12. Resilience4:BEST12. Resilience5:BEST12. Resilience6:BEST12. Resilience7:BEST12. Resilience8:BEST12. Resilience9:BEST12. Resilience10:BEST12. Fluency1:BEST12. Fluency2:BEST12. Fluency3:BEST12. Fluency4:BEST12. Fluency5:BEST
 12. Fluency6:BEST12. Fluency7:BEST12. Fluency8:BEST12. Fluency9:BEST12. Fluency10:BEST12.;
 format ID BEST12. Age BEST12. Time_Spelling BEST12. Competence1 BEST12. Competence2 BEST12. Competence3 BEST12. Competence4 BEST12. Competence5 BEST12. Competence6 BEST12. Competence7 BEST12. Competence8 BEST12. Competence9 BEST12. Competence10 BEST12. Con
 fidence1 BEST12. Confidence2 BEST12. Confidence3 BEST12. Confidence4 BEST12. Confidence5 BEST12. Confidence6 BEST12. Confidence7 BEST12. Confidence8 BEST12. Confidence9 BEST12. Confidence10 BEST12. Resilience1 BEST12. Resilience2 BEST12. Resilience3 BEST12
 . Resilience4 BEST12. Resilience5 BEST12. Resilience6 BEST12. Resilience7 BEST12. Resilience8 BEST12. Resilience9 BEST12. Resilience10 BEST12. Fluency1 BEST12. Fluency2 BEST12. Fluency3 BEST12. Fluency4 BEST12. Fluency5 BEST12. Fluency6 BEST12. Fluency7 BE
 ST12. Fluency8 BEST12. Fluency9 BEST12. Fluency10 BEST12.;
 datalines;
 1 Male 13 Public 3 3 5 3 3 5 2 3 2 5 5 5 2 2 5 1 3 3 5 2 1 5 1 2 1 5 3 2 1 5 1 4 4 2 2 1 3 1 5 4 3
 2 Male 13 Public 3 1 3 5 4 1 3 1 5 2 3 3 3 1 4 5 1 3 3 5 3 1 3 3 1 1 2 3 4 5 5 4 2 5 3 5 3 2 5 1 2
 3 Male 13 Public 3 3 3 4 3 5 4 3 3 5 3 4 5 3 1 5 1 5 4 4 2 4 5 1 2 1 4 5 4 1 2 1 5 2 3 1 3 5 5 3 1
 4 Male 13 Public 3 3 4 4 2 5 1 2 3 5 3 5 2 2 2 1 2 5 2 1 4 1 2 3 4 3 5 3 2 4 2 3 1 4 1 2 4 2 5 4 5
 5 Male 13 Public 3 4 4 2 2 2 4 2 5 5 3 5 4 5 1 2 5 4 3 1 1 2 4 5 2 4 5 1 5 4 4 3 3 1 1 1 1 5 4 4 4
 6 Male 13 Public 3 1 5 1 1 4 5 5 2 1 5 1 3 3 2 3 4 1 4 3 5 5 5 2 2 3 1 1 4 5 5 5 4 4 5 2 5 5 3 3 1
 7 Male 13 Public 3 5 1 4 5 1 1 5 5 1 2 5 5 1 5 5 2 2 4 2 5 3 5 3 3 1 2 2 5 5 2 5 1 1 3 4 3 1 4 3 2
 8 Male 13 Public 6 1 2 5 5 5 4 2 5 3 5 4 5 3 1 1 5 2 2 5 3 4 3 1 3 2 5 2 2 4 1 3 2 1 1 5 5 2 3 1 2
 9 Male 13 Public 6 2 2 2 4 5 1 4 3 4 1 3 5 2 1 5 5 1 5 3 5 1 1 4 1 3 1 5 4 5 4 4 1 1 5 3 1 5 2 4 5
 10 Male 13 Public 6 4 4 4 3 2 4 3 2 3 3 2 3 3 4 3 5 1 1 1 4 1 4 4 5 3 1 4 3 3 4 2 5 1 5 3 1 4 1 1 1
 11 Female 13 Public 6 5 1 2 2 4 5 4 1 5 1 3 1 5 5 2 4 1 1 1 1 3 5 3 4 2 4 4 5 2 4 5 1 1 3 4 5 5 2 5 3
 12 Female 13 Home 6 2 2 3 3 5 3 4 5 1 3 3 3 5 1 5 4 5 5 4 2 4 3 4 1 1 1 4 4 3 5 2 4 3 5 4 1 5 3 1 1
 21 Female 13 Private 6 2 2 2 3 4 5 2 2 1 1 3 1 1 3 5 3 4 4 5 1 1 2 4 3 4 1 1 1 5 5 2 5 4 3 4 4 1 1 1 2
 13 Male 14 Public 6 5 4 1 5 5 2 2 3 2 1 2 2 2 4 3 5 3 5 2 4 2 4 1 5 2 5 3 1 5 4 1 2 1 2 5 4 3 4 5 5
 14 Male 14 Public 6 3 5 2 5 2 2 3 5 4 2 3 4 5 2 1 5 3 2 2 3 5 3 2 3 1 4 5 2 5 5 4 4 3 4 4 3 4 2 5 2
 15 Male 14 Home 9 1 2 2 3 3 5 2 5 5 2 3 2 3 3 3 5 2 1 5 1 1 2 1 5 2 5 3 4 4 1 1 1 3 3 1 2 2 2 2 2
 16 Male 14 Home 9 1 4 1 4 4 4 1 2 3 3 1 4 1 5 4 3 3 1 4 1 5 1 3 5 4 3 3 3 3 3 3 5 5 1 3 3 4 5 2 5
 17 Male 14 Private 9 3 1 2 2 2 4 4 3 3 2 1 2 5 2 1 5 2 2 5 2 2 2 2 5 5 4 4 3 3 5 2 1 1 5 2 1 5 5 5 1
 22 Female 14 Private 9 3 1 5 2 5 2 2 4 5 5 1 5 2 2 5 5 3 5 2 5 4 3 4 3 4 2 4 3 2 4 2 4 3 4 5 1 1 4 2 2
 18 Male 15 Private 9 4 4 4 1 5 4 4 4 4 2 3 2 5 5 3 2 4 2 1 3 2 1 3 4 4 5 1 2 1 1 5 5 3 2 1 5 2 5 1 4
 19 Male 15 Private 12 3 1 1 2 4 3 3 5 5 1 5 1 1 1 5 3 5 3 2 3 4 3 3 1 3 5 2 2 5 4 5 5 5 1 2 4 2 1 1 2
 20 Male 15 Public 12 5 3 3 1 1 3 3 2 1 4 1 1 1 2 3 3 5 2 4 2 3 4 4 4 5 3 4 1 5 5 2 4 3 5 1 4 1 2 5 5
 23 Female 15 Public 12 3 5 4 4 5 1 5 1 3 3 5 4 2 1 3 1 1 3 5 1 5 2 5 1 3 5 5 3 4 1 2 4 3 1 3 4 3 3 4 5
 24 Female 15 Public 12 1 3 2 1 5 2 5 2 2 2 2 3 1 2 2 2 1 1 4 3 2 3 3 3 3 5 1 5 4 2 2 5 4 5 2 1 1 3 3 4
 ;;;;
 
Specifically, I want to be able to sum (and average) the scores by scale (competence, confidence, resilience, and fluency). The sum and average need to be new variables so that I can then use them in analyses. I appreciate any help and direction you can provide.
1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello @research318 ,

 

I could not successfully create the analysis dataset with the code (data step) provided.

So I have to give an example by coding "blind".

data work.want; 
 set work.have;
 x1 = mean(of Competence1 - Competence16);
 x2 = std (of Confidence1 - Confidence16);
 x3 = max (of Resilience1 - Resilience16);
 x4 = sum (of Fluency1    - Fluency16   );
run;

 

Thanks,

Koen

View solution in original post

14 REPLIES 14
sbxkoenk
SAS Super FREQ

Hello @research318 ,

 

I could not successfully create the analysis dataset with the code (data step) provided.

So I have to give an example by coding "blind".

data work.want; 
 set work.have;
 x1 = mean(of Competence1 - Competence16);
 x2 = std (of Confidence1 - Confidence16);
 x3 = max (of Resilience1 - Resilience16);
 x4 = sum (of Fluency1    - Fluency16   );
run;

 

Thanks,

Koen

research318
Calcite | Level 5
Thank you, sbxkoenk!
PaigeMiller
Diamond | Level 26

How do I find a sum or average of my variables? I think I can maybe use an array and sum across each of the scales

 

On the one hand you talk about variables, which has a very clear meaning in SAS, a variable is a column; but then you discuss scales, and it's not  clear to me what you mean by "scales".

 

I want to then compare the pre and post tests, perhaps using a t-test or Wilcoxon signed rank test.

 

Which are the pre and post tests? Please be much more specific about what you want to do. Please describe the problem without reference to SAS (for now) so we can understand the math you want to do; right now I do not understand what math you want to do

--
Paige Miller
research318
Calcite | Level 5

Sorry for being unclear, PaigeMiller. The "scales" are just groups of questions answered on the survey. There are 4 scales: competence, confidence, resilience, and fluency. Each will have 8-10 questions, but in the case of the data I tried to load they each have exactly 10 questions. I would like to be able to find a sum for each row, and an overall sum for all observations. I would also like to be able to find the means, standard deviations, and other summary statistics for each row and for all of the rows.

 

In regards to a pre and post test. The data that I tried to upload is the the whole thing, and will constitute the pre test. It will be given again, and that will be the post test. So the data for both the pre and post test look exactly the same. I hope that helps!

PaigeMiller
Diamond | Level 26

Okay, that's helpful, but then you talk about t-test or Wilcoxon test, you're still not clear. Are you going to use the sum for each row on each scale to feed into the t-test or Wilcoxon test? Or is the math you are planning something else?

--
Paige Miller
research318
Calcite | Level 5

Thank you for helping me clarify, PaigeMiller. I would like to compare the mean for all questions on the pre test to the mean for all of the questions on the post test. I would also like to compare the means for each group of questions (competence, confidence, resilience, and fluency) on the pre-test to the means for each corresponding group on the post-test. I believe that I will use a t-test based on this paper and this paper. If you know of a test to compare each individual row (from pre to post) I would be interested in hearing how I would do that, but the means are really what I am looking for.

PaigeMiller
Diamond | Level 26

I would like to compare the mean for all questions on the pre test to the mean for all of the questions on the post test.

 

No need to compute the means yourself. PROC TTEST will do that for you, on the raw (un-aggregated) data for each question. Same applies if you want to do the Wilcoxon test via PROC NPAR1WAY, it should work on the raw data.

 

I would also like to compare the means for each group of questions (competence, confidence, resilience, and fluency) on the pre-test to the means for each corresponding group on the post-test.

 

To do this, you would indeed need to compute the mean or sum across the row for all competence variables and for all confidence variables (and so on). Others have already provided that code. Then these sums or means are input into PROC TTEST or PROC NPAR1WAY.

 

If you know of a test to compare each individual row (from pre to post) I would be interested in hearing how I would do that

 

Compare each individual row ... now I am lost again. A t-test would compare the means of all rows from the pre-test to the mean of all rows from the post-test. I don't think you want to statistically test each individual row mean from pre-test to post-test. Furthermore, it seems as if you are discussing a situation where the exact same subjects (people) are tested both pre and post, and if that is the case then you want to use a Paired t-test, not the "regular" or un-paired t-test. https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_ttest_examples03.htm or the non-parametric equivalent.

--
Paige Miller
research318
Calcite | Level 5
Thank you, Paige Miller. Yes, I definitely want the pooled t-test. We intend to give this survey at least twice, but hopefully up to 4 times per person: pre, post, 3 months out, and 6 months out. Because of that I am actually considering doing ANOVA. I realize I didn't give enough information up front. I was mainly interested in the mechanics of the code in SAS but this has been a helpful discussion so I appreciate it.

I'm sorry for the confusion regarding row comparisons. You are correct, I don't really want to statistically test each individual row mean.
PaigeMiller
Diamond | Level 26

Yes, I definitely want the pooled t-test

 

Paired t-test

--
Paige Miller
research318
Calcite | Level 5
Yes! Oh gosh--that is what I meant. I should not respond before I've had my coffee. Sorry.
Tom
Super User Tom
Super User

Yikes. 

First thing is learn how to write a data step. Use variable lists. Do not attach formats or use informats for variables that don't need them.

Spoiler
data have;
 input ID Gender :$6. Age School_Background :$7. Time_Spelling 
   Competence1-Competence10
   Confidence1-Confidence10
   Resilience1-Resilience10
   Fluency1-Fluency10
 ;
datalines;
 1 Male   13 Public   3 3 5 3 3 5 2 3 2 5 5 5 2 2 5 1 3 3 5 2 1 5 1 2 1 5 3 2 1 5 1 4 4 2 2 1 3 1 5 4 3
 2 Male   13 Public   3 1 3 5 4 1 3 1 5 2 3 3 3 1 4 5 1 3 3 5 3 1 3 3 1 1 2 3 4 5 5 4 2 5 3 5 3 2 5 1 2
 3 Male   13 Public   3 3 3 4 3 5 4 3 3 5 3 4 5 3 1 5 1 5 4 4 2 4 5 1 2 1 4 5 4 1 2 1 5 2 3 1 3 5 5 3 1
 4 Male   13 Public   3 3 4 4 2 5 1 2 3 5 3 5 2 2 2 1 2 5 2 1 4 1 2 3 4 3 5 3 2 4 2 3 1 4 1 2 4 2 5 4 5
 5 Male   13 Public   3 4 4 2 2 2 4 2 5 5 3 5 4 5 1 2 5 4 3 1 1 2 4 5 2 4 5 1 5 4 4 3 3 1 1 1 1 5 4 4 4
 6 Male   13 Public   3 1 5 1 1 4 5 5 2 1 5 1 3 3 2 3 4 1 4 3 5 5 5 2 2 3 1 1 4 5 5 5 4 4 5 2 5 5 3 3 1
 7 Male   13 Public   3 5 1 4 5 1 1 5 5 1 2 5 5 1 5 5 2 2 4 2 5 3 5 3 3 1 2 2 5 5 2 5 1 1 3 4 3 1 4 3 2
 8 Male   13 Public   6 1 2 5 5 5 4 2 5 3 5 4 5 3 1 1 5 2 2 5 3 4 3 1 3 2 5 2 2 4 1 3 2 1 1 5 5 2 3 1 2
 9 Male   13 Public   6 2 2 2 4 5 1 4 3 4 1 3 5 2 1 5 5 1 5 3 5 1 1 4 1 3 1 5 4 5 4 4 1 1 5 3 1 5 2 4 5
10 Male   13 Public   6 4 4 4 3 2 4 3 2 3 3 2 3 3 4 3 5 1 1 1 4 1 4 4 5 3 1 4 3 3 4 2 5 1 5 3 1 4 1 1 1
11 Female 13 Public   6 5 1 2 2 4 5 4 1 5 1 3 1 5 5 2 4 1 1 1 1 3 5 3 4 2 4 4 5 2 4 5 1 1 3 4 5 5 2 5 3
12 Female 13 Home     6 2 2 3 3 5 3 4 5 1 3 3 3 5 1 5 4 5 5 4 2 4 3 4 1 1 1 4 4 3 5 2 4 3 5 4 1 5 3 1 1
21 Female 13 Private  6 2 2 2 3 4 5 2 2 1 1 3 1 1 3 5 3 4 4 5 1 1 2 4 3 4 1 1 1 5 5 2 5 4 3 4 4 1 1 1 2
13 Male   14 Public   6 5 4 1 5 5 2 2 3 2 1 2 2 2 4 3 5 3 5 2 4 2 4 1 5 2 5 3 1 5 4 1 2 1 2 5 4 3 4 5 5
14 Male   14 Public   6 3 5 2 5 2 2 3 5 4 2 3 4 5 2 1 5 3 2 2 3 5 3 2 3 1 4 5 2 5 5 4 4 3 4 4 3 4 2 5 2
15 Male   14 Home     9 1 2 2 3 3 5 2 5 5 2 3 2 3 3 3 5 2 1 5 1 1 2 1 5 2 5 3 4 4 1 1 1 3 3 1 2 2 2 2 2
16 Male   14 Home     9 1 4 1 4 4 4 1 2 3 3 1 4 1 5 4 3 3 1 4 1 5 1 3 5 4 3 3 3 3 3 3 5 5 1 3 3 4 5 2 5
17 Male   14 Private  9 3 1 2 2 2 4 4 3 3 2 1 2 5 2 1 5 2 2 5 2 2 2 2 5 5 4 4 3 3 5 2 1 1 5 2 1 5 5 5 1
22 Female 14 Private  9 3 1 5 2 5 2 2 4 5 5 1 5 2 2 5 5 3 5 2 5 4 3 4 3 4 2 4 3 2 4 2 4 3 4 5 1 1 4 2 2
18 Male   15 Private  9 4 4 4 1 5 4 4 4 4 2 3 2 5 5 3 2 4 2 1 3 2 1 3 4 4 5 1 2 1 1 5 5 3 2 1 5 2 5 1 4
19 Male   15 Private 12 3 1 1 2 4 3 3 5 5 1 5 1 1 1 5 3 5 3 2 3 4 3 3 1 3 5 2 2 5 4 5 5 5 1 2 4 2 1 1 2
20 Male   15 Public  12 5 3 3 1 1 3 3 2 1 4 1 1 1 2 3 3 5 2 4 2 3 4 4 4 5 3 4 1 5 5 2 4 3 5 1 4 1 2 5 5
23 Female 15 Public  12 3 5 4 4 5 1 5 1 3 3 5 4 2 1 3 1 1 3 5 1 5 2 5 1 3 5 5 3 4 1 2 4 3 1 3 4 3 3 4 5
24 Female 15 Public  12 1 3 2 1 5 2 5 2 2 2 2 3 1 2 2 2 1 1 4 3 2 3 3 3 3 5 1 5 4 2 2 5 4 5 2 1 1 3 3 4
;;;;

You seem to have four sets of 10 variables, but no sets that only have 8 variables.  Are those the scales that are using Likert scoring?  You did make sure that all of the questions are scaled the same and point in the same direction?

If so then you could make a scale by summing the responses.  Note that you also normally need to account for missing values, say by taking a mean or excluding cases that did not answer enough of the individual questions, but you don't seem to have any missing values.

data want;
  set have ;
  Competence=sum(of Competence1-Competence10);
  Confidence=sum(of Confidence1-Confidence10);
  Resilience = sum(of  Resilience1-Resilience10);
  Fluency= sum(of Fluency1-Fluency10);
run;

 

 

research318
Calcite | Level 5

Sorry that the data step is confusing. I pasted the code based on the example given for new users. My data is in Excel so I didn't actually create it in SAS. But I'll try to simplify it for next time since it seems like you guys can't open it. This is just dummy data--the survey is not yet complete so I can't guarantee that there will be exactly 10 questions per group (scale). The questions are all scaled the same, but hey may not all be going the same direction. I can recode any questions worded negatively. They will not be weighted (so no worries there) and I agree, missing data will be something I need to be prepared for, so I will probably stick with the mean. My sample size will be low so I want to avoid not including responses as much as possible.

Tom
Super User Tom
Super User

You probably need to consult with a statistician to understand what statistics are appropriate for your study design.

 

You mentioned pre and post. Are you getting pre and post measurements on the same individuals?   Or are they different samples taken before and after some external event?

 

I did not see anything in your example data that would indicate that some observations are pre or post.   Make sure the data is organized so that can be determined.

 

EXCEL is a spreadsheet tool, where any cell can contain any type of data, as such it is a poor format for storing data.

research318
Calcite | Level 5
The pre and post data will come from the same individuals. We are actually hoping to collect the same data again at 3 months out and 6 months out. All the data will be taken from the same survey so we will make sure we can determine which administration of the survey the data is taken from.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 5116 views
  • 3 likes
  • 4 in conversation