About BlueNose

BlueNose · ‎05-12-2014

Hello all, I have a simple question, I tried to run the new proc ICLIFETEST and SAS told me it cannot be found. I use SAS 9.4, so I assumed I will have the new proc. Can you shade some light on the matter ? I would really like to use it for interval censoring problem. I ran instead the MACRO %EMICM but it doesn't give the quartiles estimates, which is what I need actually. Thanks !

BlueNose · ‎03-17-2014

Steve, thank you for your feedback. I will resume with the simulation and when I'll have some results I will share it. Thanks again !

BlueNose · ‎03-11-2014

Dear Steve, You have found my mistake...I did forget the class statement. I added it to my code, and now both random ID and random intercept / subject=ID gives the same identical results. They both give a mean estimate of 0.6974, which is close enough, with a standard error of 0.02571, which is slightly higher than the real value of the parameter, but taking into account the correlation, it makes sense I guess. Unless, I am not calculating the standard deviation correctly, I see that the degrees of freedom is 999, should I use it as my sample size ? Because I was taking 1999....maybe I did wrong. The one last thing I can't find in my output is the correlation within a subject, which suppose to be 0.8. How can I ask SAS to estimate it ? The covariance parameter estimate of the intercept is 0.5903 and the residual is 0.1409. Should I use the equation COV/s.d(M1)*s.d(m2) ? If I should, then my previous question regarding obtaining the standard deviations is of even more importance. Regarding your comment about Gamma and Gaussian, well, this is one of the main things I want to test with this procedure. People use a mixed model without too much care for the distribution, as long as it is continuous, and I wanted to see what the outcome will be. Running a simulation of 1000 samples is the beginning, once I figure out that it's working, I will run a different data set I have simulated with 100 samples, and then another one with only 30, and then we'll see how robust the model really is. I will also try GLIMMIX.

BlueNose · ‎03-10-2014

Hello everyone, I need your advice regarding a little simulation I am trying to do. Recently I had a data set which I analyzed using a mixed model. Now I have simulated some data to test if my model was the right one. Ignoring what was in the past, now I have a data I simulated, which is a bivariate gamma distribution with a correlation of 0.8 (see attached photos). The data suppose to simulate a time variable with gamma distribution, and two measurements per each subject. I have 1000 subjects in this particular set. Each subject "gave" 2 samples with a correlation of 0.8 between them. I chose a shape parameter of 0.7 and scale of 1. I wanted a mean of 0.7. The s.d is 0.85, I am not so why, because to my calculation that yields a scale of 1.1, but doesn't matter, maybe it's due to the inaccuracies of the simulation. Now I wanted to estimate the mean and s.d of my data, taking into account the correlation between measurements within a subject, I tried two codes: proc mixed data = Long; model Value = D / s; random intercept / subject = ID; run; and proc mixed data = Long; model Value = D / s; random ID; run; where D is a vector of 1's. The first code gave me a mean value of 0.697 with s.e of 0.025 while the first code gave me 0.647 with s.e of 0.03447. The first code was closer to the truth, and I wanted to ask why. There is no explanatory variable in this model, so how come a random intercept is the right model ? In addition, shouldn't the correlation affect only the variance, and not the mean value ? I am quite confused here, I thought that using the intercept command, I make like a separate regression line per subject, but here I do not have an "X" variable, only "Y". Thank you !

BlueNose · ‎02-10-2014

Hello all, I have some basic questions regarding PROC GLIMMIX and PROC MIXED, and probably other mixed models related PROC's. Assume that I have an outcome variable Y, continuous, and a treatment variable X with 2 levels. Let's also say that every subject in the analysis contributes 3 data points (can be 3 treatments in one person, or students within schools). If I write this code: proc mixed data = ....; class SubjID; model Y = X / solution cl; [random statement] run; what is the difference between: random SubjID; to random intercept / subject = SubjID; and let's say (if it's a legal statement): random Treatment / subject = SubjID; I tried it on a small data set and the first two options gave me identical results. This question of course is relevant also for glimmix with a dist = binary. And one more question, if I have this scenario, and I run a PROC MIXED once with random statement and once with REPEATED, and the results are very similar and leading to the same conclusions, how can I choose if to use R side or G side covariance ? Thank you !

BlueNose · ‎02-02-2014

Thank you Steve, I ran this code: proc glimmix data = data; class SubjectID; model Y= /solution cl; random intercept / subject = SubjectID; run; and I got the widest CI of all other methods. Actually, my Y variable can't be negative, and my CI is [-0.003,1.13]. On one hand a wide CI is conservative, and my intuition say this is the way I should choose, on the other hand, this particular CI is not very informative (well, the upper limit is...) Tank you for your help, things are much cleared now One more thing, and I apologize for mixing things here. I had two centers in this study, which is a rather small study. I just checked if the data is "poolable". I ran a non parametric test (because the data is not really normal) and it was significant, however it didn't take into account the repeated measures nature. Then I ran a GEE which was also significant, however GLIMMIX was not (p=0.06). My question is, what rationale could I take for using a mixed model over GEE ? Are there any advantages, like better performance in small samples, or better robustness to normally assumption violation ? I am looking for a rational since my data is small and it will make things complicated if I can't pool it.

BlueNose · ‎01-30-2014

Actually, the data doesn't come from a survey, it comes from a clinical trial (designed study - single arm study). Some patients contributed a single observation while the other two. These patients are clusters. The population is clearly infinite. I find it hard to say what is the covariance structure, so I assumed "compound symmetry", seemed most reasonable, or shall I say, I don't know better or other. Would you recommend trying GLIMMIX with the identity link function ? Which model (GLMM vs. GEE) is more robust to violation of normality of the analyzed variable (I would say dependent but I do not have an explanatory) ?

BlueNose · ‎01-30-2014

Steve, you are right, the clusters are indeed not identical in size, thus the lsmean should not be equal to the raw mean. This leads me back to the initial dilemma of choosing either the result of the proc surveymeans (which I have found a book with a reference to) or to go for proc genmod with a "continuous dummy" as explanatory. The results are not identical, however very very close.

BlueNose · ‎01-29-2014

interesting indeed...! Pardon me for taking it a step backwards, but aren't Lsmeans used when I have an explanatory variable (and random effects) and I want the mean adjusted for the effect of the various stratification ? How do I obtain the Lsmeans with a single sample ? Let's say I choose the GENMOD way, if I remember correctly, SAS doesn't let you use the lsmeans statement unless a categorical variable follows ?

BlueNose · ‎01-28-2014

By averaging and weighting I thought I have created an independent sample, since my data is now one row for each subject. Or perhaps the weights are destroying that. I would also choose the Taylor series estimate, however I am not feeling comfortable to use a method I know so little on. The one thing I am not so sure about, is how SAS calculates the variance when I define cluster SubjectID; The modelling approach is also valid, however the estimate of the mean is slightly inaccurate and as far as I understand the correlation should affect the mean, just the variance.

BlueNose · ‎01-27-2014

Mark, thank you for the reply. I have removed the explanatory variable and I got the same result as I got with it, since it's a vector of 1's. I have to admit that this idea of Steve is one of the best tips I got for a long time, I like it. If I knew that the second way, of averaging within a subject and using weighted statistics of the averages (mean, SD, median, ...) is correct, I would choose it, since the Taylor series was is an approximation and my sample size is not very large. But is it the right way?

BlueNose · ‎01-26-2014

Hello all, I have a variable which theoretically should have got discrete values between 0 and 6. This variable is not a Likert scale based, but I think we can think about it as such. Practically in my sample, I had 22 independent subjects with 37 observations, i.e., some of them gave two observations while some only one. All observations apart from one were either 0 or 1. Among those subjects with two observations, apart from one case, all values matched (i.e., 0 twice or 1 twice). I need to calculate the mean of the variable, the standard deviation (or variance) and the 95% CI. I did several things, first out of curiosity, I have calculated the descriptive statistics as if I had 37 independent observations. I got a mean of 0.513 with SD of 1.044. I did it like this (and also manually): proc means data = data; var Y; run; In the next step I averaged two observations within a subject, what gave me a sample of 22 independent values. I have calculated a weighted mean and weighted SD using weights of either 1 or 2, depending on the number of observations per subject. I got a mean of 0.5135 with SD of 1.358. I did it like this: proc means data = Avged_data; var Mean_Y; weight N_Y; run; As a last attempt, I did this: proc surveymeans data = data; var Y; cluster Subject_ID; run; And I got a mean of 0.5135 with SD of 1.13 What bothers me, is that I don't understand, or more precisely know, how SAS calculated the variance in the last case. I tried looking at the help documents, and saw some complicated variance formulas (Taylor and others), however I did not see any formula specific for the cluster statement case. I wanted your advice, first for the correctness of calculating a mean and SD when a vast majority of an ordinal variable values are either 0 or 1, and second, how would you do it ? Would you choose the weighted approach, or the PROC SURVEYMEANS with the ready cluster statement that to my understanding is exactly what I need. Usually I do not like using a method I do not understand, and my fear is that at the current moment PROC SURVEYMEANS with cluster statement is a black box, does anyone knows what is the rational behind the calculation. If I'll know some details maybe I can find the article in the literature which is the source of the calculations... Thank you in advance ! Edit: I just tried one more option, I ran a GEE model with an explanatory variable of 1's, i.e. a vector (1,1,1,1,1,...,1) taking an idea Steve gave me with similar problem some time ago with a binary outcome. I did proc genmod data = data; class SubjectID; model Y = D1; repeated subject=SubjectID / corr=cs corrw; run; it gave me a mean of 0.5271 with SD of 1.23 (it gave me S.E of 0.2029 and I have extracted the SD from there). I tried the same with PROC MIXED with no success (didn't give me an intercept "line" at all in the fixed effect table). However, I see no additional value in having another pair of (mean,SD) to choose from, it is confusing enough now. The bottom line, this doesn't change my problem, how do you choose the correct answer, now that I have one more candidate of possible values....?

BlueNose · ‎01-20-2014

Hello all, I would like to simulate data of a bivariate "table" distribution. I want to generate a vector of values (0,1,2,3,4,5) with probabilities (p(0), p(1), p(2), p(3), p(4), p(5)). Then I want to generate another such vector, with a correlation of let's say 0.8 to the first vector. How do I do that ? Thank you in advance.

BlueNose · ‎01-05-2014

I ran a few models. First of all, GLIMMX, when I added Y(ref='1'), nothing has changed, but oddly enough when I did Y(ref='0') I got numbers that makes sense... I got that the estimator of p is e^0.6575 / 1+e^0.6575 = 0.658 with a CI of [0.433,0.8309] I also ran a GEE with GENMOD, using the corr=ind I got an estimate of 0.621 with CI [0.406,0.7979]. Using corr=cs I got 0.658 with CI [0.449,0.8196] and using corr=un I got similar results to corr=cs. The correlation was 0.972 I also found a chapter in the book 'statistical methods for rates and proportions' of Fleiss, talking about a one sample with correlated data. He proposed calculating the point estimate as if there wasn't a correlation and proposed a method for calculating an adjusted CI using an adjusted variance while calculating and taking into account the intraclass correlation. I got a non adjusted estimate of 0.621 with CI of [0.415,0.827]. The intraclass correlation was 0.915. If I would have ignored the correlation entirely, I would get p=0.621 with CI of [0.44,0.775] using Clopper-Pearson or [0.461,0.7594] using Wald score CI. Now my final question for this thread is, how am I suppose to choose my point and interval estimators based on all this data ? 🙂 Thank you ! P.S - My codes: proc glimmix data=...; class SubjectID index; /* The variable index would indicate the first or second observation for the subject*/ model Y (ref='0') = D1 / dist=binary solution cl; random index/residual subject=SubjectID type=chol; nloptions tech=nrridg; nloptions maxiter=200; run; proc genmod data=... descend; class SubjectID; model Y=D1 / dist=bin; repeated subject=SubjectID / corr=cs corrw; run;

BlueNose · ‎01-02-2014

You were right, when I increased the number of iterations it worked, this is the output, getting closer, but not close enough, unless, it is modeling P(Y=0) instead of P(Y=1), is it possible ? What you said is very interesting, about conditional vs. marginal. Is there a good reference where I can read deeper about the differences between these two approaches ?

Online Status	Offline
Date Last Visited	‎09-08-2018 12:32 PM

Re: GLM with data that doesn't follow the normality assumption

GLM with data that doesn't follow the normality assumption

Re: Converting Wide to long with two sets of variables

Converting Wide to long with two sets of variables

Aggregating and Plotting Quality Control Data

Re: Time Series Data with both Daily and Monthly Variables

Time Series Data with both Daily and Monthly Variables

Setting up a general linear model with multiple comparisons or contras...

Setting up a mixed model

Change from baseline transformation

Re: Time Series Data with both Daily and Monthly Variables

Re: Hypothesis Testing for Odds Ratio

Re: Power Simulation for a Two Proportions Test

Re: Power Simulation for a Two Proportions Test

Re: Replacing two sections of a string

Non-inferiority in survival analysis

Combining Data Sets Containing Character Variables of Different Length...

Importing several files into SAS

The ICLIFETEST Procedure not found

Re: Simulation for testing proc mixed

Re: Simulation for testing proc mixed

Simulation for testing proc mixed

PROC GLIMMIX and MIXED basic questions

Re: PROC SURVEYMEANS vs. PROC GENMOD vs. Weighted mean & SD

Re: PROC SURVEYMEANS vs. PROC GENMOD vs. Weighted mean & SD

Re: PROC SURVEYMEANS vs. PROC GENMOD vs. Weighted mean & SD

Re: PROC SURVEYMEANS vs. PROC GENMOD vs. Weighted mean & SD

Re: PROC SURVEYMEANS vs. PROC GENMOD vs. Weighted mean & SD

Re: PROC SURVEYMEANS vs. PROC GENMOD vs. Weighted mean & SD

PROC SURVEYMEANS vs. PROC GENMOD vs. Weighted mean & SD

Bivariate Table distribution

Re: Analyzing a single arm study with repeated measures

Re: Analyzing a single arm study with repeated measures