Hi,
I'm trying to compare same users' response in two separate years. There are many ways to respond.
The contingency table looks like this:
A_20 | B_20 | C_20 | D_20_ | … | |
A_19 | |||||
B_19 | |||||
C_19 | |||||
D_19 | |||||
… |
Letter A to D are the ways to respond. Now, I need to conduct a paired t test to assess the impact of ways to respond on response status in between years (2019 vs 2020). To do that with proc surveymeans (I have replicate weights) I will need to expand into variables with binary output for each possible combinations
Respond = 0, no_response = 1
Such that:
User | A_19 | B_19 | C_19 | D_19 | … | A_20 | B_20 | C_20 | D_20 | … | Diff_A_19_20 | Diff_A_20_19 | Diff_B_19_20 | Diff_B_20_19 | .. |
1 | 0 | 0 | |||||||||||||
2 | 1 | 0 | |||||||||||||
3 | 1 | 1 | |||||||||||||
4 | 0 | 0 | |||||||||||||
… |
My proc surveymeans will be:
ods graphics off;
proc surveymeans data = df
varmethod = jackknife mean var std stderr nobs sum all alpha=.1
;
var Diff_A_19_20 Diff_A_20_19 Diff_B_19_20 Diff_B_20_19 .....;
weight ctbw0;
repweights ctbw1 - ctbw80 / JKCOEFS = 0.05;
;
run;
ods output close;
Is there a simpler way to conduct paired t-test in this scenario? There will be too many difference variables created because of the combinations.
Maybe I'm just not thinking on the right track.
Exactly what does this mean: "Letter A to D are the ways to respond"? If you mean something like a response category A then a TTest is almost certainly not appropriate.
You should very clearly state the hypothesis you are testing, in some detail (NOT "do a ttest" but maybe "did the respondents that selected X in the first data year change their response in the second year" anyway, sentence describing the research objective(s)).
You might code to show the paired comparison as Same/Different as some sort of 1/0 or 0/1 comparison to test the proportion of Same or Different answers but unless there is some natural continuous measure hidden I think you are starting down a flawed path.
For one thing as soon as one choice is made for a year then the others are not independent (assuming categorys as no actual meaning for A B C or D has been discussed) raising all sorts of issues with ttests and independence
I suggest actually providing some examples of the values actually collected, dummy data is fine as long as it actually looks like your data.
Hi,
Letter A, B, C and D represent ways that a respondent chose to respond. They are categorical, such that A = Internet, B=Phone, C=Mail ,D=In-person and etc.
The hypothesis will be:
H0: Response by mode are not different by year from the same user.
The reason I picked paired ttest is because 2019 and 2020 are dependent since we are looking at the same user.
Following is a dummy set:
Case | Mode_19 | Mode_20_May | Res_status_19 | Res_status_20 |
1 | Paper | Phone | 1 | 0 |
2 | Paper | Other | 0 | 1 |
3 | Other | Internet | 1 | 1 |
A paired ttest would be looking at numeric different. So Ttests aren't appropriate.
Align the data by id and you calculate whether the value is the same or not.
data set2019; input case mode $; datalines; 1 Paper 2 Paper 3 Other 4 Phone 5 Other 6 Internet ; data set2020; input case mode $; datalines; 1 Phone 2 Paper 3 Other 4 Internet 5 Other 6 Internet ; data combined; merge set2019 (rename=(mode=Mode2019)) set2020 (rename=(mode=Mode2020)) ; by case; ModeSame = (mode2019=mode2020); run;
ModeSame will be 1 when the the modes have the same value and 0 when they don't. You would test the PROPORTION of 1s.
You could use the MODE2019 variable as CLASS variable to see if the result differed for staying the same for the levels of the Mode variable, i.e. the proportion of same was higher/lower for 2019 Phone responses than for 2019 Internet.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.