While I like @StatDave 's response to look at HPGENSELECT, I would suggest a couple of things before you start doing variable selection. Season, treatment, parity and body condition score (BCS) seem to me to be 3 design factors and a continuous covariate, and that covariate (BCS) is well known to have an effect on pregnancy rate in mammals. So in truth you have just four variables, with possible interaction, and no real need to employ variable selection. Try the following MODEL statement:
model result(event='pregnant')=season*treatment*parity BCS BCS*season BCS*treatment BCS*parity;
This fits a fully saturated model for the design factors, with possible different slopes for the BCS relationship. Work through this to eliminate the interaction terms where the slopes do not differ. Once you have stabilized your selection of appropriate slope terms, you could then fit an effects model, with the relevant covariate/covariate by effect interaction terms in the model. This approach is covered in Milliken and Johnson's Analysis of Messy Data, vol.3: Analysis of Covariance, or in SAS for Mixed Models (any of the editions 1 to 3) in the chapter on analysis of covariance.
Also, look at the following crosstabulation:
PROC FREQ data=maanshan320;
tables parity*season*treatment*result;
run;
That should give 8 tables that are Nx2, where N is the number of treatments and 2 is the number of levels for result. From those 8 tables, you should readily be able to identify where the separation is occurring, if anywhere, for the design factors. Also, you may want to look at the results of PROC GLM, with BCS as the dependent variable, and the design factors crossed with the result variable as the independent variables. In the LSMEANS statement, see how BCS separates as a result of the factors.
I think the root cause of the separation issue is the inclusion of high-order interactions with BCS. For some combination or combinations of the design factors and the response, there are likely to be full separations of the covariate. Additionally, fourth order interactions do a great job of modeling noise, especially when one is a continuous variable, which brings us back to @StatDave 's comment regarding fitting the data perfectly. So think carefully about the biological question at hand (which looks like it might be related to feeding dairy cows and seeing what the resulting pregnancy rate is) and formulate a model that addresses those questions.
SteveDenham
... View more