Hello all, I have a variable which theoretically should have got discrete values between 0 and 6. This variable is not a Likert scale based, but I think we can think about it as such. Practically in my sample, I had 22 independent subjects with 37 observations, i.e., some of them gave two observations while some only one. All observations apart from one were either 0 or 1. Among those subjects with two observations, apart from one case, all values matched (i.e., 0 twice or 1 twice). I need to calculate the mean of the variable, the standard deviation (or variance) and the 95% CI. I did several things, first out of curiosity, I have calculated the descriptive statistics as if I had 37 independent observations. I got a mean of 0.513 with SD of 1.044. I did it like this (and also manually): proc means data = data; var Y; run; In the next step I averaged two observations within a subject, what gave me a sample of 22 independent values. I have calculated a weighted mean and weighted SD using weights of either 1 or 2, depending on the number of observations per subject. I got a mean of 0.5135 with SD of 1.358. I did it like this: proc means data = Avged_data; var Mean_Y; weight N_Y; run; As a last attempt, I did this: proc surveymeans data = data; var Y; cluster Subject_ID; run; And I got a mean of 0.5135 with SD of 1.13 What bothers me, is that I don't understand, or more precisely know, how SAS calculated the variance in the last case. I tried looking at the help documents, and saw some complicated variance formulas (Taylor and others), however I did not see any formula specific for the cluster statement case. I wanted your advice, first for the correctness of calculating a mean and SD when a vast majority of an ordinal variable values are either 0 or 1, and second, how would you do it ? Would you choose the weighted approach, or the PROC SURVEYMEANS with the ready cluster statement that to my understanding is exactly what I need. Usually I do not like using a method I do not understand, and my fear is that at the current moment PROC SURVEYMEANS with cluster statement is a black box, does anyone knows what is the rational behind the calculation. If I'll know some details maybe I can find the article in the literature which is the source of the calculations... Thank you in advance ! Edit: I just tried one more option, I ran a GEE model with an explanatory variable of 1's, i.e. a vector (1,1,1,1,1,...,1) taking an idea Steve gave me with similar problem some time ago with a binary outcome. I did proc genmod data = data; class SubjectID; model Y = D1; repeated subject=SubjectID / corr=cs corrw; run; it gave me a mean of 0.5271 with SD of 1.23 (it gave me S.E of 0.2029 and I have extracted the SD from there). I tried the same with PROC MIXED with no success (didn't give me an intercept "line" at all in the fixed effect table). However, I see no additional value in having another pair of (mean,SD) to choose from, it is confusing enough now. The bottom line, this doesn't change my problem, how do you choose the correct answer, now that I have one more candidate of possible values....?
... View more