BookmarkSubscribeRSS Feed
amurr93
Fluorite | Level 6
DATA CombinedTiWORKING;
INFILE 'C:\Users\amurr\Documents\PhD\Variable Frame Forage Finish\SAS Files\Combined Ti working.prn' firstobs=2;
INPUT VID YEAR GROUP SIRE$ SEX$ TRMT$ FRAME$ SUPPLEMENT$ PERIOD gTi_gFecesDM gFecalOutput kgPERdayForageDMI;
PROC print;
RUN;
PROC MIXED;
Class Year Group Sire Sex TRMT Period;
Model kgPERdayForageDMI=Year Sire Sex TRMT|Period;
random Group;
REPEATED/type= TOEP r rcorr subject=VID(period);
lsmeans TRMT TRMT*Period/pdiff=ALL adjust=TUKEY;
SLICE TRMT*Period/SLICEBY=period diff adjust=TUKEY;
RUN;
PROC MIXED;
Class Year Group Sire Sex TRMT Period;
Model kgPERdayForageDMI=Year Sire Sex TRMT|Period;
random Group;
REPEATED/type= TOEP r rcorr subject=VID(period);
lsmeans Period Year/pdiff=ALL adjust=TUKEY;
RUN;
QUIT;

Please help me understand why SAS won't run analysis on one of my categorical variables.

Experiment background: This was a 2x2 study looking at how cattle frame size (large vs. small) and feeding regiment (only pasture vs. pasture + supplemented feed) influenced animal growth, forage utilization, and meat characteristics. This study ran 2 years with different sets of cattle in year 1 and year 2.

 

This particular analysis was looking at forage intake as measured by titanium concentration in feces. In year 1 there were 2 different periods of fecal collection, and in year 2 there were 3 different periods of fecal collection. The same cattle were used for both fecal collections in year 1, and the same cattle were used in all 3 fecal collections in year 2, so I ran this as a repeated measure using PROC MIXED and split it by period.

 

My problem is...I can not get SAS to analyze the data by year (code included). If I remove "YEAR" from the CLASS, MODEL, and LSMEANS statements in the included code everything runs fine, but I don't understand why I can't get a year 1 vs. year 2 analysis.

 

The weird part is SAS recognizes YEAR in PROC PRINT (picture 1), read all of my observations and recognizes YEAR as a categorical variable (picture 2), but does not recognize YEAR as a fixed effect (picture 3). I've tried going back to my excel file and defining the YEAR data column as both "number" and "general" but there is no difference.

 

Any recommendations are greatly appreciated!

 

No Year 1.JPGNo Year 2.JPGNo Year 3.JPG

10 REPLIES 10
Reeza
Super User
Can you show a proc freq output on the variable year from the input data set?
I would highly recommend you add the DATA= step to your procs as well, helps ensure that the right data set is being referenced.

proc freq data=CombinedTiWORKING;
table year;
run;
amurr93
Fluorite | Level 6

Thank you for replying to help! Here's the code and results of PROC FREQ.

 

No Year PROC FREQ code.JPG11223344

PaigeMiller
Diamond | Level 26

That's what everyone was saying. The values of period when YEAR=1 are 1 and 2, and the values of period when YEAR=2 are 3, 4 and 5, so these two variables are not independent, if you know the value of period, then the value of year is uniquely determined, and so you can't have the model estimate both. 


The LSMESTIMATE command from @SteveDenham will allow you to test if YEAR=1 produces the same mean response value as YEAR=2.

--
Paige Miller
mkeintz
PROC Star

In the model estimation component, year is reported as having zero degrees of freedom.  From way back in my statistics study, I understand this to mean that YEAR is completely predictable from the combination of other effects (SIRE, SEX, TRMT, PERIOD, TRMT*PERIOD), which in turn would mean that knowing the value of YEAR, would not improve the model prediction of kgPERdayForageDMI beyond what is determined from the other variables.

 

As @Reeza suggests, you could do a proc freq.  I believe if you ran this frequency request:

proc freq data= CombinedTiWORKING;
  tables year * sire*sex*trmt*period / list;
run;

you would find no combination of the other variables that would be associated with more than one level of year.  I.e., YEAR would be superfluous.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SteveDenham
Jade | Level 19

As I look over the output, I see that you have 83 observations, coded as VID, but not sequentially, based on the PROC PRINT. I am going to assume that GROUP is some sort of pen/pasture grouping of animals that have all of the 8 possible combinations of sex, frame and supplement level represented. The key here is that YEAR and PERIOD are completely confounded, per @mkeintz 's comment--if an observation has PERIOD = 1 or 2, it must have YEAR = 1, and if PERIOD is in 3, 4,or 5, then the record must have YEAR=2.

 

So you need to remove YEAR from the MODEL.  We will get back to how to test for a year effect after we go through the other factors. Sex, Sire, Period and Trmt are all fixed effects, with GROUP a random effect. That means that we will need to get creative outside the MODEL and RANDOM statement. First, let's look at how to model Period as a REPEATED effect. With only 2 or 3 levels per year, I would try the following structure for the covariance matrix. First, the covariance between periods in different years should be fixed at zero. Next, the number of parameters for YEAR=1 is 3 (2 variances, 1 covariance) and for YEAR=2 is 6 (3 variances, 3 covariances). If I fit something like that I would use the PARMS statement with a HOLD option. So, before we do LSMEANs and SLICES, the code I am suggesting would look like:

 

PROC MIXED;
Class Group Sire Sex TRMT Period VID;
Model kgPERdayForageDMI=Sire Sex TRMT|Period;
random Group;
REPEATED/type= UN r rcorr subject=VID(period);
PARMS (.) (.) (.)  0 0 (.)  0 0 (.) (.)  0 0 (.) (.) (.) /hold=3,4,6,7,10,11;
RUN;

If that runs without errors or warnings, then the following code can be inserted:

 

lsmeans TRMT TRMT*Period/pdiff=ALL adjust=TUKEY;
SLICE TRMT*Period/SLICEBY=period diff adjust=TUKEY;
LSMESTIMATE period 'Year 1 averaged over TRMT' 1 1 0 0 0 divisor=2,
                   'Year 2 averaged over TRMT' 0 0 1 1 1 divisor=3,
                   'Year 1 minus Year 2, =Year effect' 3 3 -2 -2 -2 divisor=6;

The LSMESTIMATE statement calculates the year 1 mean, year 2 mean and the difference, which tests the null hypothesis that the Year means are equal, holding all other parameters at their mean values.

 

SteveDenham

 

 

 

 

amurr93
Fluorite | Level 6

Thank you very much for your help and detailed response, it's helping me learn the workings of SAS coding. Yes, your assumption on "groups" is correct, each year the cattle were divided between 2 pastures, but each pasture contained animals with all the possible combinations of sex, frame, and supplement level. "VID" refers to the unique identification numbers of each animal used (basically ear tag number), so the relative values are meaningless. 

 

Unfortunately the first set of code you recommended with the PARMS statement did not run (code and log pictures below).PARMS codePARMS codePARMS code log errorsPARMS code log errors

 

 

Could I achieve the same analysis by running a contrast statement comparing Periods 1 and 2 vs. Periods 3, 4, and 5 (example code below)?

PROC MIXED data=CombinedTiWORKING;
Class Group Sire Sex TRMT Period;
Model kgPERdayForageDMI=Sire Sex TRMT|Period;
random Group;
REPEATED/type= UN r rcorr subject=VID(period);
lsmeans TRMT TRMT*Period/pdiff=ALL adjust=TUKEY;
contrast 'Year 1 vs Year 2' Period 3 3 -2 -2 -2;
SLICE TRMT*Period/SLICEBY=period diff adjust=TUKEY;

Thank you very much for all your help.

 

 

SteveDenham
Jade | Level 19

You could do that, but be aware that the covariance structure will include estimates of covariance between individuals in Year 1 and individuals in Year 2 with this approach. They are obviously independent and there should be no covariance.

 

Let me check into why the PARMS statement is not behaving the way I thought it would.

 

SteveDenham

SteveDenham
Jade | Level 19

I think the PARMS statement in GLIMMIX doesn't like parentheses. Try this PARMS statement:

 

PARMS .,.,., 0, 0,., 0, 0, ., .,  0, 0, ., ., . /hold=3,4,6,7,10,11;

If that does not work, then I think you may have to pass the values in a dataset. Non-zero entries can be obtained by fitting a UN covariance structure to separate analyses using a BY year; statement. Note that the matrix must be a full 15x15 matrix. See the example at this link in the PLM documentation:

https://documentation.sas.com/doc/en/statug/15.2/statug_plm_examples07.htm .  I realize this is getting pretty complicated.

 

SteveDenham

 

amurr93
Fluorite | Level 6

Thank you for your help! I ran the PROC FREQ and posted the results on the first comment like you recommended.

 

My only concern is that I'm losing power by making period comparisons outside of the year (ex. period 1 vs. period 5). I want to run the analysis by year because year 1 vs year 2 had entirely different cattle, so comparisons between periods 1 and 2 are of interest...comparisons between periods 3, 4, and 5 are of interest...but comparisons between periods 1 and 2 vs. periods 3, 4, and 5 are irrelevant. 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2271 views
  • 16 likes
  • 6 in conversation