Solved: Test for change in proportion over time

Merdock · Posted 11-22-2022 12:37 AM

Hello,

I have the following dataset listed below. I want to evaluate the change in obesity proportion from baseline (i.e., the proportion with obesity at each time point, relative to the number obese at baseline) and I’m a bit stuck. I was wondering:

What test should I use if I want to present summary statistics (with p-value) for the change from baseline in obesity at each time point? I'd like to have a table like the one below, if possible. I know how to test and obtain p-value for comparing proportions between groups (chi-square test) at Baseline, Visit 1 and Visit 2 but how do I do this for comparing the change from baseline in obesity proportion between Group A and Group B? Should the percentage for the change from baseline (the x% in parenthesis) be (#obese subjects at Visit 1 – #obese subjects at baseline)÷(# obese subjects at baseline)? But then how do I run a chi-square test to compare these proportions between A and B? Should I just present the table without the p-value, and then do mixed modelling to test whether rate of change in obesity over time is significant between groups A and B? (something like proc glimmix statements below maybe?)

I'd greatly appreciate if someone can provide some feedback or suggestions for how to handle this.

Thanks!

data have;
input ID visit$ group$ obesity;
datalines;
1	0	A	0
1	1	A	0
1	2	A	0
2	0	A	0
2	1	A	0
2	2	A	0
3	0	B	1
3	1	B	1
4	0	B	0
4	1	B	1
4	2	B	0
5	0	A	1
5	1	A	1
5	2	A	0
6	0	A	0
7	0	A	0
8	0	A	0
8	1	A	0
8	2	A	0
9	0	B	1
9	1	B	0
9	2	B	0
10	0	A	0
10	1	A	0
10	2	A	1
11	0	A	1
11	1	A	1
11	2	A	1
12	0	A	0
12	1	A	1
12	2	A	1
13	0	B	1
13	2	B	0
14	0	A	0
14	1	A	1
14	2	A	0
15	0	B	1
15	1	B	0
16	0	A	0
16	1	A	1
16	2	A	1
17	0	B	0
17	1	B	1
17	2	B	1
18	0	A	1
18	1	A	1
18	2	A	0
19	0	A	0
19	1	A	0
19	2	A	0
20	0	B	0
20	1	B	0
20	2	B	0
21	0	B	0
21	1	B	0
21	2	B	0
22	0	B	1
22	1	B	0
22	2	B	0
23	0	B	1
23	1	B	1
23	2	B	0
24	0	A	1
24	1	A	1
24	2	A	1
25	0	A	0
25	1	A	1
25	2	A	1
26	0	A	0
26	1	A	1
26	2	A	0
27	0	A	1
27	1	A	0
27	2	A	1
28	0	A	0
28	1	A	0
28	2	A	1
29	0	B	1
29	1	B	0
29	2	B	0
30	0	B	0
30	1	B	0
30	2	B	0
31	0	A	1
32	0	B	0
32	1	B	1
32	2	B	1
33	0	A	0
33	1	A	1
33	2	A	1
34	0	A	0
34	1	A	1
34	2	A	1
35	0	A	0
35	1	A	0
35	2	A	0
36	0	B	1
36	1	B	0
36	2	B	1
37	0	A	0
37	1	A	1
37	2	A	1
38	0	B	0
38	1	B	1
38	2	B	1
39	0	A	1
39	1	A	1
39	2	A	1
40	0	A	0
40	1	A	1
40	2	A	1
41	0	A	0
41	1	A	0
41	2	A	1
42	0	A	1
42	1	A	0
42	2	A	0
43	0	B	1
43	1	B	1
43	2	B	0
44	0	B	0
44	1	B	0
44	2	B	0
45	0	A	1
45	1	A	1
45	2	A	1
46	0	B	0
46	1	B	0
46	2	B	0
47	0	A	1
47	1	A	0
47	2	A	1
48	0	B	0
48	1	B	1
48	2	B	1
49	0	B	0
49	1	B	1
49	2	B	1
50	0	B	0
50	1	B	1
;
run;

proc glimmix data=have; 
class visit obesity group;
model obesity (event='1')=visit group visit*group/ dist=binary link=logit ddfm=bw solution;
random intercept / subject=id solution;
run;

StatDave · Posted 11-28-2022 04:30 PM

Again, what you are asking for is the "difference in difference" (DID) on the proportion (response mean) scale, and the method for this is as shown in the note I referred to earlier. As described there, you need to define a contrast among the VISIT x GROUP combinations that defines the difference between groups on the difference in visits. An example is given in the note. In your case, the following uses a contrast that compares the groups on the response difference between visits 1 and 0. In addition to the NLMeans analysis which gives the DID estimate on the mean scale, the LSMESTIMATE statement is also used but note that the estimate is on the scale of the link function - that is, it is the estimated DID on the logit (log odds) scale.

proc glimmix data=have; 
   class visit obesity group;
   model obesity (event='1')=visit group visit*group / 
      dist=binary link=logit ddfm=bw solution;
   random intercept / subject=id solution;
   lsmeans visit*group / e ilink;
   ods output coef=coeffs;
   lsmestimate visit*group '1-0 at a-b' -1 1 1 -1 0 0;
   store log;
   run;
data difdif;
   input k1-k6;
   set=1;
   datalines;
   -1 1 1 -1 0 0
   ;
%NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
         title=Difference in Difference of Means)

Now, regarding the label in the NLMeans output... It is vital that you look at the order of the values being analyzed. In this case, that is the LS-means estimates displayed by the LSMEANS statement. The coefficients that you specify in each observation in the COEF= data set are applied to the means in the order of those values. It works exactly the same as the coefficients in the LSMESTIMATE statement. So, in the above code the coefficients in the NLMeans contrast estimate -1*(1st mean)+1*(2nd mean)+1*(3rd mean)-1*(4th mean). This can be written as a DID: (A1-A0)-(B1-B0), where A and B are your groups and 0,1,2 are your visits.

View solution in original post

Merdock · Posted 11-21-2022 08:23 PM

Hello,

I have the following dataset listed below. I want to evaluate the change in obesity proportion from baseline (i.e., the proportion with obesity at each time point, relative to the number obese at baseline) and I’m a bit stuck. I was wondering:

What test should I use if I want to present summary statistics (with p-value) for the change from baseline in obesity at each time point? I'd like to have a table like the one below, if possible. I know how to test and obtain p-value for comparing proportions between groups (chi-square test) at Baseline, Visit 1 and Visit 2 but how do I do this for comparing the change from baseline in obesity proportion between Group A and Group B? Should the percentage for the change from baseline (the x% in parenthesis) be (#obese subjects at Visit 1 – #obese subjects at baseline)÷(# obese subjects at baseline)? But then how do I run a chi-square test to compare these proportions between A and B? Should I just present the table without the p-value, and then do mixed modelling to test whether rate of change in obesity over time is significant between groups A and B? (something like proc glimmix statements below maybe?)

I'd greatly appreciate if someone can provide some feedback or suggestions for how to handle this.

Thanks!

data have;
input ID visit$ group$ obesity;
datalines;
1	0	A	0
1	1	A	0
1	2	A	0
2	0	A	0
2	1	A	0
2	2	A	0
3	0	B	1
3	1	B	1
4	0	B	0
4	1	B	1
4	2	B	0
5	0	A	1
5	1	A	1
5	2	A	0
6	0	A	0
7	0	A	0
8	0	A	0
8	1	A	0
8	2	A	0
9	0	B	1
9	1	B	0
9	2	B	0
10	0	A	0
10	1	A	0
10	2	A	1
11	0	A	1
11	1	A	1
11	2	A	1
12	0	A	0
12	1	A	1
12	2	A	1
13	0	B	1
13	2	B	0
14	0	A	0
14	1	A	1
14	2	A	0
15	0	B	1
15	1	B	0
16	0	A	0
16	1	A	1
16	2	A	1
17	0	B	0
17	1	B	1
17	2	B	1
18	0	A	1
18	1	A	1
18	2	A	0
19	0	A	0
19	1	A	0
19	2	A	0
20	0	B	0
20	1	B	0
20	2	B	0
21	0	B	0
21	1	B	0
21	2	B	0
22	0	B	1
22	1	B	0
22	2	B	0
23	0	B	1
23	1	B	1
23	2	B	0
24	0	A	1
24	1	A	1
24	2	A	1
25	0	A	0
25	1	A	1
25	2	A	1
26	0	A	0
26	1	A	1
26	2	A	0
27	0	A	1
27	1	A	0
27	2	A	1
28	0	A	0
28	1	A	0
28	2	A	1
29	0	B	1
29	1	B	0
29	2	B	0
30	0	B	0
30	1	B	0
30	2	B	0
31	0	A	1
32	0	B	0
32	1	B	1
32	2	B	1
33	0	A	0
33	1	A	1
33	2	A	1
34	0	A	0
34	1	A	1
34	2	A	1
35	0	A	0
35	1	A	0
35	2	A	0
36	0	B	1
36	1	B	0
36	2	B	1
37	0	A	0
37	1	A	1
37	2	A	1
38	0	B	0
38	1	B	1
38	2	B	1
39	0	A	1
39	1	A	1
39	2	A	1
40	0	A	0
40	1	A	1
40	2	A	1
41	0	A	0
41	1	A	0
41	2	A	1
42	0	A	1
42	1	A	0
42	2	A	0
43	0	B	1
43	1	B	1
43	2	B	0
44	0	B	0
44	1	B	0
44	2	B	0
45	0	A	1
45	1	A	1
45	2	A	1
46	0	B	0
46	1	B	0
46	2	B	0
47	0	A	1
47	1	A	0
47	2	A	1
48	0	B	0
48	1	B	1
48	2	B	1
49	0	B	0
49	1	B	1
49	2	B	1
50	0	B	0
50	1	B	1
;
run;

proc glimmix data=have; 
class visit obesity group;
model obesity (event='1')=visit group visit*group/ dist=binary link=logit ddfm=bw solution;
random intercept / subject=id solution;
run;

Ksharp · Posted 11-21-2022 10:07 PM

Better post it at Stat Forum

https://communities.sas.com/t5/Statistical-Procedures/bd-p/statistical_procedures

And calling @StatDave @lvm @SteveDenham

Merdock · Posted 11-22-2022 12:37 AM

@Ksharp, thanks! I switched it over to the stat forum.

Kurt_Bremser · Posted 11-22-2022 02:19 AM

No need for double posting, call out to one of the Super Users, we can move posts.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

StatDave · Posted 11-22-2022 10:19 AM

What you are talking about is a so-called "difference in difference" analysis - compare the difference between groups on the difference in response at two visits. See this note (particularly the second part "Generalized linear models with non-identity link") which discusses and illustrates estimating and testing the difference in difference on the mean (proportion) scale. For your case, you would probably want to use the NLMeans macro with a contrast defining each difference in difference that you want.

SteveDenham · Posted 11-22-2022 11:09 AM

@StatDave will probably have more to say on this, but here goes:

To get the chi-squared values for these comparisons on the original scale (rather than a ratio of the log odds that using the ilink option gives), you'll need the %NLmeans and %NLest macros. Be sure you are using the most recent versions of these.

Try the following code:

proc glimmix data=have; 
class visit obesity group;
model obesity (event='1')=visit group visit*group/ dist=binary link=logit ddfm=bw solution;
random intercept / subject=id solution;
lsmeans visit|group;
store logitfit;
ods output coef=coeffs;
run;


proc plm restore=logitfit;
	lsmeans visit group visit*group/e ilink diff=control;
	slice visit*group/e sliceby=visit diff=control ilink;
	slice visit*group/e sliceby=group diff=control ilink;
	ods output coef=coeffs lsmeans=lsmeans;
	run;

/* These addresses will have to be changed. There is a chance that you are working with a version of SAS that has these macros included, such that %include statements are not needed */

%include "B:\Steve\GLIMMIXforSAS\nlmeans.sas";
%include "B:\Steve\GLIMMIXforSAS\nlest.sas";

%nlmeans(version,instore=logitfit, coef=coeffs, link=logit, diff=1, null=0, title=Diff of Mean Proportions)

The log will have a bunch of WARNING statements about the final Hessian not being positive definite, but they can all be ignored. The pertinent output are:

GLIMMIX type3 tests and least squares means on both the logit scale and proportion scale

NLmeans output from the LSMEANS and SLICE statements with the estimated difference and standard error on the proportion scale, a Wald chi-square value and associated p value, and the bounds for a 95% confidence interval on the difference. Unfortunately, I have never been able to figure out how to change the label on the default printout, but in this case, you can get the associated effects by going in order through the lsmeans and slice statements.

1. labels are 1 -1 0 and 1 0 -1: Differences in marginal lsmeans for visits compared to baseline

2. label is 1 -1: Difference in marginal lsmeans for groups, comparing to group A

3. A table with five labels: Difference in lsmeans compared to baseline of group A

4. label is 1 -1: Difference in lsmeans for groups, at visit=0

5. label is 1 -1: Difference in lsmeans for groups, at visit=1

6. label is 1 -1: Difference in lsmeans for groups, at visit=2

7. labels are 1 -1 0 and 1 0 -1: Difference in lsmeans for visits 1 and 2 from baseline, for group A

8. labels are 1 -1 0 and 1 0 -1: Difference in lsmeans for visits 1 and 2 from baseline, for group B

Hope this helps.

(In my analysis, neither of the main effects nor the interaction are significant, so in my opinion, comparisons are probably something that are valid only if pre-specified.)

SteveDenham

Merdock · Posted 11-28-2022 04:01 PM

@SteveDenham, thank you so much for your suggestions and code, this is definitely very helpful! I've tried the code below but I am having a bit of trouble understanding how to interpret that five labels table (snapshot below).

-What does each of those five labels mean? Does the first label mean that -0.08519 is the difference in mean obesity proportion for group A at visit 1, compared to baseline? Then the second label is difference in mean obesity proportion for group A at visit 2, compared to baseline, while the third and fourth labels denote the same thing but for group B? But what's the fifth label for?

-Which one of the 8 labels obtained from %NLmeans and %NLest, is relevant to my situation if I am interested in the p-values for comparing group A with group B in terms of the change from baseline in proportion with obesity at each of the visits? Would it be label#1 on your list below?

StatDave · Posted 11-28-2022 04:30 PM

Again, what you are asking for is the "difference in difference" (DID) on the proportion (response mean) scale, and the method for this is as shown in the note I referred to earlier. As described there, you need to define a contrast among the VISIT x GROUP combinations that defines the difference between groups on the difference in visits. An example is given in the note. In your case, the following uses a contrast that compares the groups on the response difference between visits 1 and 0. In addition to the NLMeans analysis which gives the DID estimate on the mean scale, the LSMESTIMATE statement is also used but note that the estimate is on the scale of the link function - that is, it is the estimated DID on the logit (log odds) scale.

proc glimmix data=have; 
   class visit obesity group;
   model obesity (event='1')=visit group visit*group / 
      dist=binary link=logit ddfm=bw solution;
   random intercept / subject=id solution;
   lsmeans visit*group / e ilink;
   ods output coef=coeffs;
   lsmestimate visit*group '1-0 at a-b' -1 1 1 -1 0 0;
   store log;
   run;
data difdif;
   input k1-k6;
   set=1;
   datalines;
   -1 1 1 -1 0 0
   ;
%NLMeans(instore=log, coef=coeffs, link=logit, contrasts=difdif,
         title=Difference in Difference of Means)

Now, regarding the label in the NLMeans output... It is vital that you look at the order of the values being analyzed. In this case, that is the LS-means estimates displayed by the LSMEANS statement. The coefficients that you specify in each observation in the COEF= data set are applied to the means in the order of those values. It works exactly the same as the coefficients in the LSMESTIMATE statement. So, in the above code the coefficients in the NLMeans contrast estimate -1*(1st mean)+1*(2nd mean)+1*(3rd mean)-1*(4th mean). This can be written as a DID: (A1-A0)-(B1-B0), where A and B are your groups and 0,1,2 are your visits.

Merdock · Posted 12-05-2022 02:53 PM

@StatDave, thank you so much for your explanation and clarifications! It took me a while to digest everything but I think it all makes more sense now and I got a good handle on it now.

SteveDenham · Posted 11-29-2022 07:27 AM

@StatDave 's response covers most of your questions, but I will try to be specific to 2 things you asked.

The first deals with the five labels. You have 3 visits (baseline, visit1, visit2) and two drugs. Each line is the comparison to baseline for drug A - note that there are six "columns", referring to baseline/A, visit1/A, visit2/A, baseline/B, visit1/B, visit2/B. This comes from using the diff=control option with no sliceby statement.

The second is which of the output tables refer to your question, and the answer is none of them, since I missed the point that you were looking for difference-in-difference. See @StatDave 's excellent response on how to do that.

SteveDenham

Merdock · Posted 12-05-2022 02:52 PM

@SteveDenham, thank you!

Test for change in proportion over time

Re: test for change in proportion over time

test for change in proportion over time

Re: test for change in proportion over time

Re: test for change in proportion over time

Re: test for change in proportion over time

Re: test for change in proportion over time

Re: test for change in proportion over time

Re: test for change in proportion over time

Re: test for change in proportion over time

Re: test for change in proportion over time

Re: test for change in proportion over time

Re: test for change in proportion over time

SAS Innovate 2025: Call for Content