turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Double clustered paired t test?

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-18-2015 01:14 AM

Hello everyone.

I would like to do a paired t test with two-way clustered standard error. Similar to Peterson (2009) and Thompson (2011) treatment in the regression context.

My readings so far tells me that if I were to treat the pairs as two independent samples, I can treat the two-sample t test as if it is a regression and use readily available codes to cluster the standard errors. But I do not understand how to achieve the similar effect with paired t test or one-sample t test.

My data structure:

obs cluster1 cluster2 var1 var2

1 A a 0.5 0.2

2 A b 1.1 -0.2

3 B a -0.5 1.2

4 B b 1.7 0.2

....

The null hypothesis is that var1 and var2 has the same mean (or var1 - var2 has a mean of zero)

Thanks,

Hao

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Haoz

03-18-2015 02:23 PM

This is a simple mixed model, once you get the data into long format.

data want;

set have;

val=var1, level=1; output;

val=var2; level=2; output;

drop var1 var2;

run;

proc mixed data=have;

class cluster1 cluster2 level obs;

model val=level;

random cluster1 cluster2;

repeated level/subject=obs type=un;

lsmeans level/diff;

run;

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

03-20-2015 03:51 AM

Thank you very much Steve, I think this is what i am after. I think that the REPEATED statement really made the difference here.

However, when I run my sample of any size(from 5000 obs with 90 cluster1 and 250 cluster2 to 100 obs with 5 cluster1 20 cluster2). The algorithm does not converge at all. With the following error message

NOTE: An infinite likelihood is assumed in iteration 0 because of a nonpositive

residual variance estimate.

WARNING: Did not converge.

Have I made any mistakes?

May I ask one follow up question? I feel like I should have deduced this..... But how do we test the mean of var1 against 0 in the same clustering context? i.e. Null is that mean(var1) = 0?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Haoz

03-20-2015 01:09 PM

Whenever you have a REPEATED statement, you have to make sure that all observations are unique for each subject (no duplicate records for any subject, which here are called obs). The data step that converted from wide to long format should have handled that, but it is possible that there are duplicates still. It looks like the error is in my proc mixed code where I had data=have. That should be the long format, and should read data=want. I apologize for this error.

Check your data, and see if it has all converted to the long format, and that you are using the long formatted dataset in PROC MIXED.

If you want to test whether var1 = 0, then change the model statement to:

model=level/noint solution;

Because level now identifies which level of var is analyzed, the t value and probability associated with level 1 is a test of this kind. You could also test this by using an LSMESTIMATE statement:

LSMESTIMATE level 'Test of level 1 = 0' 1 0;

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

03-21-2015 08:49 PM

Hi Peter,

Thank you very much for your reply.

**On the non-convergence issue:**

I have noticed that small mistake of 'data = have' and changed it to 'data = want'.

I have also double checked for duplicates. For each obs I have only two rows, representing level 1 and level 2 respectively (by using proc freq and using lag(obs) ne obs for each level).

if I reduce the code into just one cluster as following: it still does not converge:

proc mixed data=want;

class cluster1 level obs;

model val=level;

random cluster1 ;

repeated level/subject=obs type=un;

lsmeans level/diff;

run;

If i get rid of the two clusters altogether, it still does not converge:

proc mixed data=want;

class level obs;

model val=level;

repeated level/subject=obs type=un;

lsmeans level/diff;

run;

If I then delete the REPEATED statement as following:

proc mixed data=want;

class cluster1 level;

model val=level;

random cluster1 ;

lsmeans level/diff;

run;

the program converges. so is the program with two clusters and no REPEATED statement.

However, when I chop off about half of the observations (from 33526 rows to 14726 rows) in 'want' and ran the same code as above, the program does not converge again. The log message is as follows:

NOTE: An infinite likelihood is assumed in iteration 0 because of a nonpositive

residual variance estimate.

WARNING: Stopped because of infinite likelihood.

Therefore, I am afraid that the issue may be about statistics rather than our codes (it could very well be the REPEATED statement etc). But I am confused because I have tens of thousands of observations and not that many clusters (30-90 and 250 respectively). Plus, the density plot and the Q-Q plot looks pretty good in the unconditional paired ttest.

Is there anyway to perhaps use an alternative algorithm?

**On the test whether var1 = 0:**

I ran the following codes based on your advice, and still does not achieve convergence: I feel like the issue here is likely to be the same as the issue above.

proc mixed data=want;

class cluster1 cluster2 level obs;

model val=level;

random cluster1 cluster2;

repeated level/subject=obs type=un;

LSMESTIMATE level 'Test of level 1 = 0' 1 0;

run;

and

proc mixed data=want;

class cluster1 cluster2 level obs;

model val=level/noint solution;

random cluster1 cluster2;

repeated level/subject=obs type=un;

run;

Is it possible for me to upload a small sample for diagnosis?

Regards,

Hao

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Haoz

03-30-2015 09:35 AM

The runs that are not converging (not those with the infinite likelihood):

Can you post the output? I think this may just be a case that more iterations are needed, but cannot be sure until I can get a look at the output.

I think this is all that is wrong, and all else can be fixed from there.

Code, output and (small) data would help.

Steve Denham