BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Merdock
Quartz | Level 8

Hello,

 

I have the following dataset listed below. I want to evaluate the change in obesity proportion from baseline (i.e., the proportion with obesity at each time point, relative to the number obese at baseline) and I’m a bit stuck. I was wondering:

 

What test should I use if I want to present summary statistics (with p-value) for the change from baseline in obesity at each time point? I'd like to have a table like the one below, if possible. I know how to test and obtain p-value for comparing proportions between groups (chi-square test) at Baseline, Visit 1 and Visit 2 but how do I do this for comparing the change from baseline in obesity proportion between Group A and Group B? Should the percentage for the change from baseline (the x% in parenthesis) be (#obese subjects at Visit 1 – #obese subjects at baseline)÷(# obese subjects at baseline)? But then how do I run a chi-square test to compare these proportions between A and B?  Should I just present the table without the p-value, and then do mixed modelling to test whether rate of change in obesity over time is significant between groups A and B? (something like proc glimmix statements below maybe?)

 

I'd greatly appreciate if someone can provide some feedback or suggestions for how to handle this.

 

Thanks!

 

data have;
input ID visit$ group$ obesity;
datalines;
1	0	A	0
1	1	A	0
1	2	A	0
2	0	A	0
2	1	A	0
2	2	A	0
3	0	B	1
3	1	B	1
4	0	B	0
4	1	B	1
4	2	B	0
5	0	A	1
5	1	A	1
5	2	A	0
6	0	A	0
7	0	A	0
8	0	A	0
8	1	A	0
8	2	A	0
9	0	B	1
9	1	B	0
9	2	B	0
10	0	A	0
10	1	A	0
10	2	A	1
11	0	A	1
11	1	A	1
11	2	A	1
12	0	A	0
12	1	A	1
12	2	A	1
13	0	B	1
13	2	B	0
14	0	A	0
14	1	A	1
14	2	A	0
15	0	B	1
15	1	B	0
16	0	A	0
16	1	A	1
16	2	A	1
17	0	B	0
17	1	B	1
17	2	B	1
18	0	A	1
18	1	A	1
18	2	A	0
19	0	A	0
19	1	A	0
19	2	A	0
20	0	B	0
20	1	B	0
20	2	B	0
21	0	B	0
21	1	B	0
21	2	B	0
22	0	B	1
22	1	B	0
22	2	B	0
23	0	B	1
23	1	B	1
23	2	B	0
24	0	A	1
24	1	A	1
24	2	A	1
25	0	A	0
25	1	A	1
25	2	A	1
26	0	A	0
26	1	A	1
26	2	A	0
27	0	A	1
27	1	A	0
27	2	A	1
28	0	A	0
28	1	A	0
28	2	A	1
29	0	B	1
29	1	B	0
29	2	B	0
30	0	B	0
30	1	B	0
30	2	B	0
31	0	A	1
32	0	B	0
32	1	B	1
32	2	B	1
33	0	A	0
33	1	A	1
33	2	A	1
34	0	A	0
34	1	A	1
34	2	A	1
35	0	A	0
35	1	A	0
35	2	A	0
36	0	B	1
36	1	B	0
36	2	B	1
37	0	A	0
37	1	A	1
37	2	A	1
38	0	B	0
38	1	B	1
38	2	B	1
39	0	A	1
39	1	A	1
39	2	A	1
40	0	A	0
40	1	A	1
40	2	A	1
41	0	A	0
41	1	A	0
41	2	A	1
42	0	A	1
42	1	A	0
42	2	A	0
43	0	B	1
43	1	B	1
43	2	B	0
44	0	B	0
44	1	B	0
44	2	B	0
45	0	A	1
45	1	A	1
45	2	A	1
46	0	B	0
46	1	B	0
46	2	B	0
47	0	A	1
47	1	A	0
47	2	A	1
48	0	B	0
48	1	B	1
48	2	B	1
49	0	B	0
49	1	B	1
49	2	B	1
50	0	B	0
50	1	B	1
;
run;

proc glimmix data=have; 
class visit obesity group;
model obesity (event='1')=visit group visit*group/ dist=binary link=logit ddfm=bw solution;
random intercept / subject=id solution;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Again, what you are asking for is the "difference in difference" (DID) on the proportion (response mean) scale, and the method for this  is as shown in the note I referred to earlier. As described there, you need to define a contrast among the VISIT x GROUP combinations that defines the difference between groups on the difference in visits. An example is given in the note. In your case, the following uses a contrast that compares the groups on the response difference between visits 1 and 0. In addition to the NLMeans analysis which gives the DID estimate on the mean scale, the LSMESTIMATE statement is also used but note that the estimate is on the scale of the link function - that is, it is the estimated DID on the logit (log odds) scale.

proc glimmix data=have; 
   class visit obesity group;
   model obesity (event='1')=visit group visit*group / 
      dist=binary link=logit ddfm=bw solution;
   random intercept / subject=id solution;
   lsmeans visit*group / e ilink;
   ods output coef=coeffs;
   lsmestimate visit*group '1-0 at a-b' -1 1 1 -1 0 0;
   store log;
   run;
data difdif;
   input k1-k6;
   set=1;
   datalines;
   -1 1 1 -1 0 0
   ;
%NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
         title=Difference in Difference of Means)

Now, regarding the label in the NLMeans output... It is vital that you look at the order of the values being analyzed. In this case, that is the LS-means estimates displayed by the LSMEANS statement. The coefficients that you specify in each observation in the COEF= data set are applied to the means in the order of those values. It works exactly the same as the coefficients in the LSMESTIMATE statement. So, in the above code the coefficients in the NLMeans contrast estimate -1*(1st mean)+1*(2nd mean)+1*(3rd mean)-1*(4th mean). This can be written as a DID: (A1-A0)-(B1-B0), where A and B are your groups and 0,1,2 are your visits.

View solution in original post

11 REPLIES 11
Merdock
Quartz | Level 8

Hello,

 

I have the following dataset listed below. I want to evaluate the change in obesity proportion from baseline (i.e., the proportion with obesity at each time point, relative to the number obese at baseline) and I’m a bit stuck. I was wondering:

 

What test should I use if I want to present summary statistics (with p-value) for the change from baseline in obesity at each time point? I'd like to have a table like the one below, if possible. I know how to test and obtain p-value for comparing proportions between groups (chi-square test) at Baseline, Visit 1 and Visit 2 but how do I do this for comparing the change from baseline in obesity proportion between Group A and Group B? Should the percentage for the change from baseline (the x% in parenthesis) be (#obese subjects at Visit 1 – #obese subjects at baseline)÷(# obese subjects at baseline)? But then how do I run a chi-square test to compare these proportions between A and B?  Should I just present the table without the p-value, and then do mixed modelling to test whether rate of change in obesity over time is significant between groups A and B? (something like proc glimmix statements below maybe?)

 

I'd greatly appreciate if someone can provide some feedback or suggestions for how to handle this.

 

Thanks!

data have;
input ID visit$ group$ obesity;
datalines;
1	0	A	0
1	1	A	0
1	2	A	0
2	0	A	0
2	1	A	0
2	2	A	0
3	0	B	1
3	1	B	1
4	0	B	0
4	1	B	1
4	2	B	0
5	0	A	1
5	1	A	1
5	2	A	0
6	0	A	0
7	0	A	0
8	0	A	0
8	1	A	0
8	2	A	0
9	0	B	1
9	1	B	0
9	2	B	0
10	0	A	0
10	1	A	0
10	2	A	1
11	0	A	1
11	1	A	1
11	2	A	1
12	0	A	0
12	1	A	1
12	2	A	1
13	0	B	1
13	2	B	0
14	0	A	0
14	1	A	1
14	2	A	0
15	0	B	1
15	1	B	0
16	0	A	0
16	1	A	1
16	2	A	1
17	0	B	0
17	1	B	1
17	2	B	1
18	0	A	1
18	1	A	1
18	2	A	0
19	0	A	0
19	1	A	0
19	2	A	0
20	0	B	0
20	1	B	0
20	2	B	0
21	0	B	0
21	1	B	0
21	2	B	0
22	0	B	1
22	1	B	0
22	2	B	0
23	0	B	1
23	1	B	1
23	2	B	0
24	0	A	1
24	1	A	1
24	2	A	1
25	0	A	0
25	1	A	1
25	2	A	1
26	0	A	0
26	1	A	1
26	2	A	0
27	0	A	1
27	1	A	0
27	2	A	1
28	0	A	0
28	1	A	0
28	2	A	1
29	0	B	1
29	1	B	0
29	2	B	0
30	0	B	0
30	1	B	0
30	2	B	0
31	0	A	1
32	0	B	0
32	1	B	1
32	2	B	1
33	0	A	0
33	1	A	1
33	2	A	1
34	0	A	0
34	1	A	1
34	2	A	1
35	0	A	0
35	1	A	0
35	2	A	0
36	0	B	1
36	1	B	0
36	2	B	1
37	0	A	0
37	1	A	1
37	2	A	1
38	0	B	0
38	1	B	1
38	2	B	1
39	0	A	1
39	1	A	1
39	2	A	1
40	0	A	0
40	1	A	1
40	2	A	1
41	0	A	0
41	1	A	0
41	2	A	1
42	0	A	1
42	1	A	0
42	2	A	0
43	0	B	1
43	1	B	1
43	2	B	0
44	0	B	0
44	1	B	0
44	2	B	0
45	0	A	1
45	1	A	1
45	2	A	1
46	0	B	0
46	1	B	0
46	2	B	0
47	0	A	1
47	1	A	0
47	2	A	1
48	0	B	0
48	1	B	1
48	2	B	1
49	0	B	0
49	1	B	1
49	2	B	1
50	0	B	0
50	1	B	1
;
run;

proc glimmix data=have; 
class visit obesity group;
model obesity (event='1')=visit group visit*group/ dist=binary link=logit ddfm=bw solution;
random intercept / subject=id solution;
run;
Merdock
Quartz | Level 8
@Ksharp, thanks! I switched it over to the stat forum.
StatDave
SAS Super FREQ

What you are talking about is a so-called "difference in difference" analysis - compare the difference between groups on the difference in response at two visits. See this note (particularly the second part "Generalized linear models with non-identity link") which discusses and illustrates estimating and testing the difference in difference on the mean (proportion) scale. For your case, you would probably want to use the NLMeans macro with a contrast defining each difference in difference that you want. 

SteveDenham
Jade | Level 19

@StatDave will probably have more to say on this, but here goes:

 

To get the chi-squared values for these comparisons on the original scale (rather than a ratio of the log odds that using the ilink option gives), you'll need the %NLmeans and %NLest macros.  Be sure you are using the most recent versions of these.

 

Try the following code:

 

proc glimmix data=have; 
class visit obesity group;
model obesity (event='1')=visit group visit*group/ dist=binary link=logit ddfm=bw solution;
random intercept / subject=id solution;
lsmeans visit|group;
store logitfit;
ods output coef=coeffs;
run;


proc plm restore=logitfit;
	lsmeans visit group visit*group/e ilink diff=control;
	slice visit*group/e sliceby=visit diff=control ilink;
	slice visit*group/e sliceby=group diff=control ilink;
	ods output coef=coeffs lsmeans=lsmeans;
	run;

/* These addresses will have to be changed. There is a chance that you are working with a version of SAS that has these macros included, such that %include statements are not needed */

%include "B:\Steve\GLIMMIXforSAS\nlmeans.sas";
%include "B:\Steve\GLIMMIXforSAS\nlest.sas";

%nlmeans(version,instore=logitfit, coef=coeffs, link=logit, diff=1, null=0, title=Diff of Mean Proportions)

The log will have a bunch of WARNING statements about the final Hessian not being positive definite, but they can all be ignored. The pertinent output are:

 

GLIMMIX type3 tests and least squares means on both the logit scale and proportion scale

 

NLmeans output from the LSMEANS and SLICE statements with the estimated difference and standard error on the proportion scale, a Wald chi-square value and associated p value, and the bounds for a 95% confidence interval on the difference. Unfortunately, I have never been able to figure out how to change the label on the default printout, but in this case, you can get the associated effects by going in order through the lsmeans and slice statements.

 

1.  labels are 1 -1 0 and 1 0 -1: Differences in marginal lsmeans for visits compared to baseline

2.  label is 1 -1: Difference in marginal lsmeans for groups, comparing to group A

3.  A table with five labels: Difference in lsmeans compared to baseline of group A

4.  label is 1 -1: Difference in lsmeans for groups, at visit=0

5.  label is 1 -1: Difference in lsmeans for groups, at visit=1

6.  label is 1 -1: Difference in lsmeans for groups, at visit=2

7.  labels are 1 -1 0 and 1 0 -1: Difference in lsmeans for visits 1 and 2 from baseline, for group A

8.  labels are 1 -1 0 and 1 0 -1: Difference in lsmeans for visits 1 and 2 from baseline, for group B

 

Hope this helps.

 

(In my analysis, neither of the main effects nor the interaction are significant, so in my opinion, comparisons are probably something that are valid only if pre-specified.)

 

SteveDenham

 

Merdock
Quartz | Level 8

@SteveDenham, thank you so much for your suggestions and code, this is definitely very helpful! I've tried the code below but I am having a bit of trouble understanding how to interpret that five labels table (snapshot below).

 

-What does each of those five labels mean? Does the first label mean that -0.08519 is the difference in mean obesity proportion for group A at visit 1, compared to baseline? Then the second label is difference in mean obesity proportion for group A at visit 2, compared to baseline, while the third and fourth labels denote the same thing but for group B? But what's the fifth label for?

5labels.PNG

-Which one of the 8 labels obtained from %NLmeans and %NLest, is relevant to my situation if I am interested in the p-values for comparing group A with group B in terms of the change from baseline in proportion with obesity at each of the visits? Would it be label#1 on your list below?

Merdock_0-1669669269399.png

 

 

StatDave
SAS Super FREQ

Again, what you are asking for is the "difference in difference" (DID) on the proportion (response mean) scale, and the method for this  is as shown in the note I referred to earlier. As described there, you need to define a contrast among the VISIT x GROUP combinations that defines the difference between groups on the difference in visits. An example is given in the note. In your case, the following uses a contrast that compares the groups on the response difference between visits 1 and 0. In addition to the NLMeans analysis which gives the DID estimate on the mean scale, the LSMESTIMATE statement is also used but note that the estimate is on the scale of the link function - that is, it is the estimated DID on the logit (log odds) scale.

proc glimmix data=have; 
   class visit obesity group;
   model obesity (event='1')=visit group visit*group / 
      dist=binary link=logit ddfm=bw solution;
   random intercept / subject=id solution;
   lsmeans visit*group / e ilink;
   ods output coef=coeffs;
   lsmestimate visit*group '1-0 at a-b' -1 1 1 -1 0 0;
   store log;
   run;
data difdif;
   input k1-k6;
   set=1;
   datalines;
   -1 1 1 -1 0 0
   ;
%NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
         title=Difference in Difference of Means)

Now, regarding the label in the NLMeans output... It is vital that you look at the order of the values being analyzed. In this case, that is the LS-means estimates displayed by the LSMEANS statement. The coefficients that you specify in each observation in the COEF= data set are applied to the means in the order of those values. It works exactly the same as the coefficients in the LSMESTIMATE statement. So, in the above code the coefficients in the NLMeans contrast estimate -1*(1st mean)+1*(2nd mean)+1*(3rd mean)-1*(4th mean). This can be written as a DID: (A1-A0)-(B1-B0), where A and B are your groups and 0,1,2 are your visits.

Merdock
Quartz | Level 8
@StatDave, thank you so much for your explanation and clarifications! It took me a while to digest everything but I think it all makes more sense now and I got a good handle on it now.
SteveDenham
Jade | Level 19

@StatDave 's response covers most of your questions, but I will try to be specific to 2 things you asked.

 

The first deals with the five labels. You have 3 visits (baseline, visit1, visit2) and two drugs. Each line is the comparison to baseline for drug A - note that there are six "columns", referring to baseline/A, visit1/A, visit2/A, baseline/B, visit1/B, visit2/B. This comes from using the diff=control option with no sliceby statement.

 

The second is which of the output tables refer to your question, and the answer is none of them, since I missed the point that you were looking for difference-in-difference. See @StatDave 's excellent response on how to do that.

 

SteveDenham

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 1375 views
  • 11 likes
  • 5 in conversation