Re: Distribution-fitting

aska_ujita · Posted 04-11-2019 08:52 AM

Hello there!

I am testing residual milk of my cows...

But I am having a problem to choose the best distribution.

By PROC UNIVARIATE I saw that normal distribuition isn´t fit (view output in PDF).

I did PROC SEVERITY to test the distributions by AICC criterion.

proc severity data=C crit=aicc;
loss RESIDUAL1;
dist _predefined_;
run;

So that is the output:

Maybe	-1789	Yes
Yes	1243	No
Yes	752.79495	No
Yes	5159	No
Yes	5144	No
Yes	1245	No
Yes	1245	No
Yes	5131	No

So, I can see that the procedure suggested Burr... but now I wanna test this one with PROC UNIVARIATE, is that possible??

And if I could use PROC GLM, how I can configure it for this distribution.

Thank you.

Ksharp · Posted 04-11-2019 10:07 AM

Better post it at Stat forum. @StatDave @Rick.Wicklin is there.

If you want check the distribution of a variable .try

proc genmod data=have ;

model resudual= /distribution=normal;

run;

proc genmod data=have ;

model resudual= /distribution=lognormal;

run;

proc genmod data=have ;

model resudual= /distribution=gamma;

run;

aska_ujita · Posted 04-11-2019 10:08 AM

Thank you!!!

aska_ujita · Posted 04-11-2019 10:13 AM

Hello there!

I am testing residual milk of my cows...

But I am having a problem to choose the best distribution.

By PROC UNIVARIATE I saw that normal distribuition isn´t fit (view output in PDF).

I did PROC SEVERITY to test the distributions by AICC criterion.

proc severity data=C crit=aicc;
loss RESIDUAL1;
dist _predefined_;
run;

So that is the output:

Maybe	-1789	Yes
Yes	1243	No
Yes	752.79495	No
Yes	5159	No
Yes	5144	No
Yes	1245	No
Yes	1245	No
Yes	5131	No

So, I can see that the procedure suggested Burr... but now I wanna test this one with PROC UNIVARIATE, is that possible??

And if I could use PROC GLM, how I can configure it for this distribution.

Thank you.

Doc_Duke · Posted 04-11-2019 10:17 AM

Please a question to just one forum. Thanks.

StatDave · Posted 04-11-2019 10:38 AM

UNIVARIATE doesn't have the Burr distribution. What is the reason for wanting to use UNIVARIATE? If you want to estimate the parameters of the Burr distribution, PROC SEVERITY can give those estimates as well.

aska_ujita · Posted 05-09-2019 09:07 AM

Hello StatDave_sas, can I use GENMOD to analyze my data? Or GLIMMIX?

I always used glm or mixed to analyze milk data, but I tested the distribution and my milk residual have better fitting with Gamma distribution (because it have a lot of zeros in one side - see the image attached, please, I think is that the reason that gamma is better, right?)

My classes are: treatment (treated and control), day of lactation (1,3,7,15,30,45 and 60th), parturition order (multiparous and primiparous) and the cow.

My effects are: treatment, day of lactation, parturition order, day/month/year of data observation/measurement (same year), age and interaction treatment*day of lactation, treatment*parturition order.

I collected one data for each day of lactation (1,3,7,15,30,45 and 60th), totalizing seven information per cow.

I have 20 different cows in each treatment, totalizing 40 cows in the experiment.

Thank you always.

Best, Aska.

Rick_SAS · Posted 05-09-2019 10:34 AM

It appears that all your residuals are positive. If the model fits the data, I would expect to see some positive and some negative residuals. Could you post the model that produces the RESIDUAL1 variable?

aska_ujita · Posted 05-09-2019 11:55 AM

Hello Rick_SAS, thank you for replying and helping me.

I did like this:

PROC GENMOD;
CLASS GRUPO OP1 DL1 vaca;
MODEL PL1= GRUPO DL1 op1 grupo*op1 grupo*dl1 data idade;
lsmeans grupo/pdiff adjust=tukey lines;
lsmeans grupo*op1/pdiff adjust=tukey lines;
RUN;

PROC GENMOD;
CLASS GRUPO OP1 DL1 vaca;
MODEL RESIDUAL1= GRUPO DL1 op1 grupo*op1 grupo*dl1 data idade/ dist=gamma;
lsmeans grupo/pdiff adjust=tukey;
lsmeans grupo*op1/pdiff adjust=tukey;
RUN;

Sorry that is all in portuguese, but: PL is milk production (fits normal distribution), Residual1 is the milk residual, grupo is the treatment (treated group and control group), vaca is the cow, op1 is the parturition order, dl1 is the day of lactation, data is the observetion/measurement day and idade is the age.

I attached the OutPut in PDF file.

Thank you very much.

Best, Aska.

Rick_SAS · Posted 05-09-2019 01:05 PM

Ah, now I see! I thought "RESDIDUAL1" meant "the difference between the observed and predicted response," but it really is a measurement of the "residual milk" that is left inside the udder after the cow is milked. Thank you for explaining.

When you are performing a generalized regression analysis, the DIST= option does not refer to the unconditional distribution of the response variable. Therefore you should not choose the DIST= option based on using PROC UNIVARIATE or SEVERITY to test the univariate distribution of PL1 or RESIDUAL1. The DIST= option specifies the CONDITIONAL distribution of the response variable after accounting for the values of the regressors (independent variables).

In particular, the Y variable in a linear regression model does not need to be normally distributed and similar statements hold for generalized linear models. For the linear model, the "normal distribution" refers to the normality of the residuals.

aska_ujita · Posted 05-09-2019 01:33 PM

Hello Rick_SAS, thank you very much to clarifying that.

Yes, I was talking about residual milk (I will mention it as RM - residual milk - from now to not confuse us).

It is really difficult for me to understand few statistic concepts... Hopefully exist kind and helpful people like you to help us! And I am really grateful for that and makes me very willing to learn it.

So... I understood that I have to test if my residuals are normal.

And to do that I can use this:

ods graphics on;

proc reg data=C;
MODEL PL1= GRUPO DL1 op1 data idade;
quit;

ods graphics on;
proc reg data=C;
MODEL RM= GRUPO DL1 op1 data idade;
quit;

I put the interactions, but it seems that didn't accept. So I took off.

And my OutPut (Attached as PDF) shows that my model (hight value) isn't so good, right?

But the residuals look like normal to RM and PL. Is that correct?

Thank you very much to help me.

Best, Aska.

Rick_SAS · Posted 05-09-2019 01:50 PM

Interactions are not allowed in PROC REG. You should use PROC GLM (or GENMOD), which supports interaction terms and CLASS variables. For example, the main effects model is

proc glm data=C;

class GRUPO OP1;
MODEL RM= GRUPO DL1 op1 data idade / solution;
quit;

It seems that your questions might not be about SAS but about statistics. I encourage you to talk to a statistical person (professor, researchers, consultant) at your place of work. They can help you to create a statistical model that reflects the design of your experiment. After you understand the model you are trying to fit, we can help you with the SAS syntax.

Distribution-fitting