Hi!
I have a tricky problem with this data set. It’s longitudinal (we are comparing 2017 vs. 2015), and there are survey weights. However, most of the outcomes of interest are dichotomous. The PhD statistician on our team suggested that, even though they’re dichotomous, I could still compute the difference between each variable, say “did you use this book in 2017” vs. “did you use this book in 2015”? and the possibilities for the difference are: 0, 1, -1. He said it’s fine if I just test that this difference = 0 (even though technically it’s not a continuous variable). I can't regress used_book1_in_2017 = used_book1_in_2015 bc we'd need to incorporate a fixed effects model in this. So we're keeping it simple for now.
What I’ve been seeing online is that folks suggest that you just test to see if that difference = 0 in surveymeans. I’m not sure how to do that though? I know you can use the domain statement, but the problem is, I just want to test if my difference variable = 0. There is no group indicator.
Here, difft1 = used_book1_in_2017 – used_book1_in_2015. difft1 can = 0,1, -1.
This is what I have so far:
proc surveymeans data = atp1517b; weight weight; var difft1; run;
But this just gives me the mean, SE and CI. Any advice? Thanks!
I am not the best to ask for tests of difference, so I don't have the complete formulas my mathematics Ph.D.s gave me to use. I can though describe how we do this. We use T-tests. But we take into account that in our case the means we are comparing are from the same sample. Sorry this is little different than yours perhaps, if your longitudinal respondents are not the same population between 2015 and 2017. If they are the same, then I think you can do it our way, but it takes two calls to calculating the standard error. ( I think you can use PROC SURVEY for this call). I think, it is formally called a 't test of statistically significant difference between dependent ( or non-independent) sample means.'
But we do publish yes/no variables like so... How many companies in Nova Scotia are saying they innovate ( yes /no), out of all companies in Nova Scotia? Lets us say, for the sake of argument the answer is half the companies in Nova Scotia are innovative. So we publish 50% in that table cell.
Now suppose we survey both small and large companies but someone wants to know are small companies more innovative than large companies? They also want to know if this difference between small and large companies is statistical significant. Is the different we observe in our survey real? Does this sound like what you are trying to do?
This paragraph simply outlines this approach. We take the SE of the two estimates small companies are 50% innovative and large companies are 75% innovative and these estimates have SEs of SE[s] ( standard error small companies) and SE[l] (standard error large companies). Then we make a formula with N[s] for small companies ( number of small company respondents) and N[l] for large companies (number of large company respondents). We work these into a formula I can't remember now but your statistician should be able to write for you, given what I am saying. Then attach that to each record/respondent record and send it to the SE calculator one more time and out comes an SE you can plug into your standard T test for difference of means.
It may be you are just trying to understand how dichotomous variables become means. May be this shows you that
(number of yes)
____________ *100 = X%. The mean is the average of X(i)s.
(number of respondents being asked question)
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.