PROC GEE

billi_billi · Posted 03-23-2021 12:03 PM

Hello

First time working with clustered data. I am using PROC GEE and accounting for the clustering but the variables that I mentioned in the REPEATED SUBJECT and WITHIN statement have some missing values, is there any way to overcome this? Because of these missing values I am not getting any results, just get an error saying 'A missing value was detected in the SUBJECT, WITHINSUBJECT, or LOGORVAR effect. All values of variables in these effects must be non-missing.'

Below is my code:

PROC GEE DATA=DATA DESC;
CLASS ID AREA DOCTOR_VISIT ;
MODEL DOCTOR_VISIT=AGE/ DIST=BIN LINK=LOGIT;
REPEATED SUBJECT=ID/ WITHIN=AREA CORR=CS;
RUN;

StatDave · Posted 03-23-2021 01:09 PM

Those values have to be nonmissing so that the data for an observation is properly associated with a cluster (subject) and properly positioned within the cluster.

billi_billi · Posted 03-23-2021 01:19 PM

@StatDave Thank you for the reply. Is there any option that I can use to exclude the missing data?

StatDave · Posted 03-23-2021 01:38 PM

yes, just include a WHERE statement like:
where doctor_visit ne . and area ne .;

billi_billi · Posted 03-23-2021 02:39 PM

@StatDave Thank you this worked. But then I got an error 'The within effect should be unique'. I am assuming this is because I have duplicates within, so I changed my code from this:

PROC GEE DATA=ED_DATA DESC;
WHERE ID NE . AND area NE . ;
CLASS ID study_census_tract3 doctor_visit ;
MODEL doctor_visit=age/ DIST=BIN LINK=LOGIT;
REPEATED SUBJECT=ID/ WITHIN=area CORR=IND;
RUN;

TO THIS CODE:

PROC GEE DATA=ED_DATA DESC;
WHERE ID NE . AND area NE . ;
CLASS ID study_census_tract3 doctor_visit ;
MODEL doctor_visit=age/ DIST=BIN LINK=LOGIT;
REPEATED SUBJECT=ID*area/ CORR=IND;
RUN;

Just wanted to check if this is correct or there is some other way to overcome the above error. Also is there any way to gets odds ratio?

Sorry, to bombard you with so many questions. But I really appreciate your help. Thank you very much

StatDave · Posted 03-23-2021 03:14 PM

You might not be understanding the purpose of the WITHIN= variable - it is used to order the observations within each cluster. This is necessary when the correlation structure requires ordering such as with the TYPE=AR structure. A separate issue is whether your SUBJECT= variable has unique values for every cluster. If that variable repeats values in the data set and not all of them are in the same cluster, then you might need to involve a second variable as discussed in this note.

billi_billi · Posted 03-23-2021 09:15 PM

@StatDave Thank you very much for this article. This really helped. I am trying to find if there is any documentation to overcome the error I got for WITHINSUBJECT=variable.

If you happen to know any documentation related to this could you please direct me to that? If not thank you very much for all the replies and your information.

StatDave · Posted 03-23-2021 09:56 PM

As I said, ordering within clusters is important when the specified correlation structure takes that into account. The exchangeable structure (TYPE=EXCH or CS) is not one such. So, you do not need to specify the WITHIN= option.

PROC GEE

Re: PROC GEE

Re: PROC GEE

Re: PROC GEE

Re: PROC GEE

Re: PROC GEE

Re: PROC GEE

Re: PROC GEE

Catch up on SAS Innovate 2026