Solved: Tail of Normal Density Curve Extends Below 0

Adele3

Hello,

I produced a histogram for a single continuous variable (minimum value=0, mean=31, SD=14) using PROC SGPLOT. I included a normal density curve for the distribution, as well as a normal density curve for the same continuous variable from a standard population for comparison (lowest possible value=0, mean=43, SD=26.0).

Given the lowest possible value is 0, I am not sure why the left tail of my normal curves extend past 0, and do not intersect at 0. Any insight would be very much appreciated! Thank you:)

FreelanceReinh

@Adele3 wrote:

I am still not clear why a random variable within my normal distributions could take a negative value when this is not plausible.

Those normal distributions are only approximations. They will not match exactly the (unknown) distribution of your SCORE1 variable. The same is true for other theoretical distributions (including strictly positive distributions). This is a very common situation. Nobody will conclude that your scores might be negative sometimes or that the normal approximation is invalid because of that minor and unavoidable discrepancy. If, for example, rainfall data are approximated by a normal distribution, it goes without saying that nevertheless rainfall is always non-negative.

In your PROC SGPLOT step you can only try to avoid displaying negative tick marks (e.g., by using the MIN=0 option of the XAXIS statement, as you have done already, or with options of the HISTOGRAM statement). You cannot force the normal density curves through the origin, though, as this would contradict their definition.

View solution in original post

ballardw

Code.

Can't tell what happened without the code.

Which will need to include the data used for the histogram.

Adele3

Sorry about that @ballardw , I shared code below in another response.

ballardw

@Adele3 wrote:

Sorry about that @ballardw , I shared code below in another response.

However there is no data.

FreelanceReinh

Hello @Adele3,

@Adele3 wrote:

Given the lowest possible value is 0, I am not sure why the left tail of my normal curves extend past 0, and do not intersect at 0.

While the lowest possible value of your continuous variable may be 0, there is no lower limit to the values of a normal distribution. You can use the CDF function to compute the probability that a random variable with a normal distribution (with mean m and standard deviation s) takes a negative value:

data _null_;
p1=cdf('normal',0,31,14);
p2=cdf('normal',0,43,26);
put (p:)(=/);
run;

Result:

p1=0.0134045654
p2=0.0490793878

So, using the mean and SD values you mentioned, these probabilities are small, but clearly positive.

A normal distribution may approximate the distribution of your variable quite well regardless of the discrepancy for negative values. But there are also continuous distributions whose probability density is exactly zero for negative values. They may or may not provide a better approximation.

Adele3

Hello @FreelanceReinh,

Thank you for taking the time to share these details - I really appreciate it!

Since a standard normal distribution is continuous I believe it can theoretically extend indefinitely, however I am still not clear why a random variable within my normal distributions could take a negative value when this is not plausible.

You shared that there are continuous distributions whose probability density is exactly zero for negative values, am I able to specify in my code that random negative values are not plausible?

PROC SGPLOT DATA=test; 
     HISTOGRAM score1 / FILLATTRS=(COLOR=LIGGR);
     DENSITY score1 / TYPE=normal LINEATTRS=(COLOR=black pattern=1) NAME="normal" LEGENDLABEL="normal curve 1";
     DENSITY score1 / TYPE=normal (MU=43 SIGMA=26) LINEATTRS=(COLOR=black pattern=2) NAME="standard" 
                           LEGENDLABEL="normal curve 2";
     KEYLEGEND "normal" "standard" / LOCATION=inside POSITION=topright ACROSS=1; 
     XAXIS LABEL='score' MIN=0 OFFSETMIN=0.05 MAX=120;
RUN;

FreelanceReinh

@Adele3 wrote:

I am still not clear why a random variable within my normal distributions could take a negative value when this is not plausible.

Those normal distributions are only approximations. They will not match exactly the (unknown) distribution of your SCORE1 variable. The same is true for other theoretical distributions (including strictly positive distributions). This is a very common situation. Nobody will conclude that your scores might be negative sometimes or that the normal approximation is invalid because of that minor and unavoidable discrepancy. If, for example, rainfall data are approximated by a normal distribution, it goes without saying that nevertheless rainfall is always non-negative.

In your PROC SGPLOT step you can only try to avoid displaying negative tick marks (e.g., by using the MIN=0 option of the XAXIS statement, as you have done already, or with options of the HISTOGRAM statement). You cannot force the normal density curves through the origin, though, as this would contradict their definition.

Adele3

Thank you @FreelanceReinh, you have been very helpful!

Tail of Normal Density Curve Extends Below 0

Re: Tail of Normal Density Curve Extends Below 0

Re: Tail of Normal Density Curve Extends Below 0

Re: Tail of Normal Density Curve Extends Below 0

Re: Tail of Normal Density Curve Extends Below 0

Re: Tail of Normal Density Curve Extends Below 0

Re: Tail of Normal Density Curve Extends Below 0

Re: Tail of Normal Density Curve Extends Below 0

Re: Tail of Normal Density Curve Extends Below 0

Classroom Training Available!