alternating logistic regression with cluster analysis

Teresa12 · Posted 05-16-2025 10:18 AM

Hi

I am working on an alternative logistic regression model with repeated measures
The drug is the unit of analysis, and there are TWO levels of clustering—within patient, and patient within MD

My dataset looks like this (My dataset contains approximately 15000 rows)
data drugs;
input patient MD Gender Age drug $ indication outcome;
datalines;
1 1 2 54 a 1 1
1 1 2 54 b 1 1
1 1 2 54 c 0 0
2 4 1 41 a 1 0
2 4 1 41 c 1 0
2 4 1 41 e 1 1
3 1 1 24 h 0 0
4 5 2 29 c 1 0
5 1 2 72 a 1 1
5 1 2 72 a 0 1
6 2 2 72 i 0 1
6 2 2 72 b 0 0
7 1 1 36 a 0 0
8 3 1 25 a 0 1
;

PROC GENMOD DATA= drugs;
CLASS MD PATIENT gender(ref="1") indication(ref="1");
MODEL outcome (event='1')= gender indication age / DIST=BIN ;
REPEATED SUBJECT= PATIENT / TYPE=EXCH ;
RUN ;

What is wrong with my model?

Thank you very much for your help

ballardw · Posted 05-16-2025 03:57 PM

Can you describe in a bit more detail what the research question the analysis is supposed to answer?

Teresa12 · Posted 05-19-2025 07:33 AM

Hello,

Thank you for your response.

I am working on evaluating whether certain drugs are prescribed correctly, which will be my outcome measure. If the prescription complies with the recommended dosage, the outcome will be assigned a value of 1; if not, it will be assigned a value of 0.

Please note that a patient may be prescribed more than one medication, and around 15000 patients are treated by approximately 150 physicians.

Ksharp · Posted 05-16-2025 10:04 PM

According to the example of GEE or GENMOD, you need option logor= to do ALR model.

PROC GENMOD DATA= drugs;
CLASS MD PATIENT gender(ref="1") indication(ref="1");
MODEL outcome (event='1')= gender indication age / DIST=BIN ;
REPEATED SUBJECT= PATIENT / logor=fullclust ;
RUN ;

And to build a GEE model better to use the newer PROC GEE.

PROC GEE DATA= drugs;
CLASS PATIENT gender(ref="1") indication(ref="1");
MODEL outcome (event='1')= gender indication age / DIST=BIN ;
REPEATED SUBJECT= PATIENT / logor=fullclust ;
RUN ;

Teresa12 · Posted 05-19-2025 07:34 AM

Hello,
Thank you for your response. However, in the case of GEE, why is the physician missing in the repeated subject?

Ksharp · Posted 05-20-2025 04:17 AM

Sorry. I don't understand 'what missing in REPEATED statement' ?
I am not expert about ALR.
Maybe @StatDave @SteveDenham could give you some contructive suggestion.

StatDave · Posted 05-20-2025 10:59 AM

Questions like this that are about statistical methods or statistical procedures will be addressed faster and get more attention if you post them in the Analytics>Statistical Procedures community.

First, note that the GEE method is robust to not specifying exactly the right clustering structure, so it is not unreasonable the use the GEE model from the code you showed in your first post.

However, if you particularly want to use an ALR model to estimate log odds ratios, and if your data consists of patients clustered within physicians and with multiple observations from patients, then you need to change your REPEATED statement options. Instead of TYPE=EXCH, which requests a GEE model, specify SUBJECT=MD (I assume that MD is your physician indicator) then specify LOGOR=NEST1 and SUBCLUSTER=PATIENT:

   repeated  subject=md / logor=nest1 subcluster=patient;

While LOGOR=NESTK (with SUBCLUSTER=PATIENT) or FULLCLUST are other possible structures, they are probably not feasible since you indicate that, on average, there are 100 patients per physician. These structures would probably require the estimation far too many log odds ratios, and in any case would probably not be useful.

Teresa12 · Posted 05-20-2025 12:11 PM

Thank you for your time and assistance; you helped me understand!

StatDave · Posted 05-20-2025 12:18 PM

My previous suggestion using SUBJECT=MD, LOGOR=NEST1, and SUBCLUSTER=PATIENT will give you two log odds ratios - one for any pair of responses within a patient and one for any pair of responses from different patients. That uses all the responses from all patients in a physician as a cluster. Another possible analysis is using SUBJECT=PATIENT(MD) and LOGOR=LOGORVAR(MD). This treats all responses from each distinct patient as a cluster and estimates a log odds ratio for each physician essentially pooling across all his/her patients.

alternating logistic regression with cluster analysis

Re: alternating logistic regression with cluster analysis

Re: alternating logistic regression with cluster analysis

Re: alternating logistic regression with cluster analysis

Re: alternating logistic regression with cluster analysis

Re: alternating logistic regression with cluster analysis

Re: alternating logistic regression with cluster analysis

Re: alternating logistic regression with cluster analysis

Re: alternating logistic regression with cluster analysis

Registration is open

SAS Training: Just a Click Away