BookmarkSubscribeRSS Feed
aska_ujita
Obsidian | Level 7

Hello there!

 

I am testing residual milk of my cows...

But I am having a problem to choose the best distribution.

 

By PROC UNIVARIATE I saw that normal distribuition isn´t fit (view output in PDF).

 

I did PROC SEVERITY to test the distributions by AICC criterion.

 

proc severity data=C crit=aicc;
loss RESIDUAL1;
dist _predefined_;
run;

 

So that is the output:


Maybe-1789Yes
Yes1243No
Yes752.79495No
Yes5159No
Yes5144No
Yes1245No
Yes1245No
Yes5131No

 

So, I can see that the procedure suggested Burr... but now I wanna test  this one with PROC UNIVARIATE, is that possible??

 

And if I could use PROC GLM, how I can configure it for this distribution.

 

Thank you.

 

 

11 REPLIES 11
Ksharp
Super User

Better post it at Stat forum. @StatDave  @Rick.Wicklin  is there.

 

If you want check the distribution of a variable .try 

 

proc genmod data=have ;

model resudual= /distribution=normal;

run;

 

proc genmod data=have ;

model resudual= /distribution=lognormal;

run;

 

proc genmod data=have ;

model resudual= /distribution=gamma;

run;

aska_ujita
Obsidian | Level 7

Thank you!!!

aska_ujita
Obsidian | Level 7

Hello there!

 

I am testing residual milk of my cows...

But I am having a problem to choose the best distribution.

 

By PROC UNIVARIATE I saw that normal distribuition isn´t fit (view output in PDF).

 

I did PROC SEVERITY to test the distributions by AICC criterion.

 

proc severity data=C crit=aicc;
loss RESIDUAL1;
dist _predefined_;
run;

 

So that is the output:


Maybe-1789Yes
Yes1243No
Yes752.79495No
Yes5159No
Yes5144No
Yes1245No
Yes1245No
Yes5131No

 

So, I can see that the procedure suggested Burr... but now I wanna test  this one with PROC UNIVARIATE, is that possible??

 

And if I could use PROC GLM, how I can configure it for this distribution.

 

Thank you.

 

Doc_Duke
Rhodochrosite | Level 12

Please a question to just one forum.  Thanks.

StatDave
SAS Super FREQ

UNIVARIATE doesn't have the Burr distribution. What is the reason for wanting to use UNIVARIATE? If you want to estimate the parameters of the Burr distribution, PROC SEVERITY can give those estimates as well.

aska_ujita
Obsidian | Level 7

Hello StatDave_sas, can I use GENMOD to analyze my data? Or GLIMMIX?

I always used glm or mixed to analyze milk data, but I tested the distribution and my milk residual have better fitting with Gamma distribution (because it have a lot of zeros in one side - see the image attached, please, I think is that the reason that gamma is better, right?)

 

My classes are: treatment (treated and control), day of lactation (1,3,7,15,30,45 and 60th), parturition order (multiparous and primiparous) and the cow.

My effects are: treatment, day of lactation, parturition order, day/month/year of data observation/measurement (same year), age and interaction treatment*day of lactation, treatment*parturition order.

 

I collected one data for each day of lactation (1,3,7,15,30,45 and 60th), totalizing seven information per cow.

 

I have 20 different cows in each treatment, totalizing 40 cows in the experiment.

 

Thank you always.

 

Best, Aska.

 

 
Rick_SAS
SAS Super FREQ

It appears that all your residuals are positive. If the model fits the data, I would expect to see some positive and some negative residuals. Could you post the model that produces the RESIDUAL1 variable?

aska_ujita
Obsidian | Level 7

Hello Rick_SAS, thank you for replying and helping me.

 

I did like this:

 


PROC GENMOD;
CLASS GRUPO OP1 DL1 vaca;
MODEL PL1= GRUPO DL1 op1 grupo*op1 grupo*dl1 data idade;
lsmeans grupo/pdiff adjust=tukey lines;
lsmeans grupo*op1/pdiff adjust=tukey lines;
RUN;

 

PROC GENMOD;
CLASS GRUPO OP1 DL1 vaca;
MODEL RESIDUAL1= GRUPO DL1 op1 grupo*op1 grupo*dl1 data idade/ dist=gamma;
lsmeans grupo/pdiff adjust=tukey;
lsmeans grupo*op1/pdiff adjust=tukey;
RUN;

Sorry that is all in portuguese, but: PL is milk production (fits normal distribution), Residual1 is the milk residual, grupo is the treatment (treated group and control group), vaca is the cow, op1 is the parturition order, dl1 is the day of lactation, data is the observetion/measurement day and idade is the age.

 

I attached the OutPut in PDF file.

 

Thank you very much.

 

Best, Aska.

Rick_SAS
SAS Super FREQ

Ah, now I see! I thought "RESDIDUAL1" meant "the difference between the observed and predicted response," but it really is a measurement of the "residual milk" that is left inside the udder after the cow is milked. Thank you for explaining.

 

When you are performing a generalized regression analysis, the DIST= option does not refer to the unconditional distribution of the response variable. Therefore you should not choose the DIST= option based on using PROC UNIVARIATE or SEVERITY to test the univariate distribution of PL1 or RESIDUAL1.  The DIST= option specifies the CONDITIONAL distribution of the response variable after accounting for the values of the regressors (independent variables).

 

In particular, the Y variable in a linear regression model does not need to be normally distributed and similar statements hold for generalized linear models. For the linear model, the "normal distribution" refers to the normality of the residuals.

aska_ujita
Obsidian | Level 7

Hello Rick_SAS, thank you very much to clarifying that.

 

Yes, I was talking about residual milk (I will mention it as RM - residual milk - from now to not confuse us).

It is really difficult for me to understand few statistic concepts... Hopefully exist kind and helpful people like you to help us! And I am really grateful for that and makes me very willing to learn it.

 

So... I understood that I have to test if my residuals are normal. 

 

And to do that I can use this:

 

ods graphics on;

proc reg data=C;
MODEL PL1= GRUPO DL1 op1 data idade;
quit;

 

ods graphics on;
proc reg data=C;
MODEL RM= GRUPO DL1 op1 data idade;
quit;

 

I put the interactions, but it seems that didn't accept. So I took off.

 

And my OutPut (Attached as PDF) shows that my model (hight value) isn't so good, right?

But the residuals look like normal to RM and PL. Is that correct?

 

Thank you very much to help me.

 

Best, Aska.

Rick_SAS
SAS Super FREQ

Interactions are not allowed in PROC REG. You should use PROC GLM (or GENMOD), which supports interaction terms and CLASS variables. For example, the main effects model is

 

proc glm data=C;

class GRUPO OP1;  
MODEL RM= GRUPO DL1 op1 data idade / solution;
quit;

 

It seems that your questions might not be about SAS but about statistics. I encourage you to talk to a statistical person (professor, researchers, consultant) at your place of work. They can help you to create a statistical model that reflects the design of your experiment. After you understand the model you are trying to fit, we can help you with the SAS syntax.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3414 views
  • 1 like
  • 5 in conversation