BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Adele3
Fluorite | Level 6

Hello,

 

I produced a histogram for a single continuous variable (minimum value=0, mean=31, SD=14) using PROC SGPLOT. I included a normal density curve for the distribution, as well as a normal density curve for the same continuous variable from a standard population for comparison (lowest possible value=0, mean=43, SD=26.0). 

 

Given the lowest possible value is 0, I am not sure why the left tail of my normal curves extend past 0, and do not intersect at 0. Any insight would be very much appreciated! Thank you:)

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

@Adele3 wrote:

I am still not clear why a random variable within my normal distributions could take a negative value when this is not plausible. 

Those normal distributions are only approximations. They will not match exactly the (unknown) distribution of your SCORE1 variable. The same is true for other theoretical distributions (including strictly positive distributions). This is a very common situation. Nobody will conclude that your scores might be negative sometimes or that the normal approximation is invalid because of that minor and unavoidable discrepancy. If, for example, rainfall data are approximated by a normal distribution, it goes without saying that nevertheless rainfall is always non-negative.

 

In your PROC SGPLOT step you can only try to avoid displaying negative tick marks (e.g., by using the MIN=0 option of the XAXIS statement, as you have done already, or with options of the HISTOGRAM statement). You cannot force the normal density curves through the origin, though, as this would contradict their definition.

View solution in original post

7 REPLIES 7
ballardw
Super User

Code.

Can't tell what happened without the code.

Which will need to include the data used for the histogram.

 

 

Adele3
Fluorite | Level 6

Sorry about that @ballardw , I shared code below in another response. 

ballardw
Super User

@Adele3 wrote:

Sorry about that @ballardw , I shared code below in another response. 


However there is no data.

FreelanceReinh
Jade | Level 19

Hello @Adele3,

 


@Adele3 wrote:

Given the lowest possible value is 0, I am not sure why the left tail of my normal curves extend past 0, and do not intersect at 0.


While the lowest possible value of your continuous variable may be 0, there is no lower limit to the values of a normal distribution. You can use the CDF function to compute the probability that a random variable with a normal distribution (with mean m and standard deviation s) takes a negative value:

data _null_;
p1=cdf('normal',0,31,14);
p2=cdf('normal',0,43,26);
put (p:)(=/);
run;

Result:

p1=0.0134045654
p2=0.0490793878

So, using the mean and SD values you mentioned, these probabilities are small, but clearly positive.

 

A normal distribution may approximate the distribution of your variable quite well regardless of the discrepancy for negative values. But there are also continuous distributions whose probability density is exactly zero for negative values. They may or may not provide a better approximation.

Adele3
Fluorite | Level 6

Hello @FreelanceReinh,

 

Thank you for taking the time to share these details - I really appreciate it! 

 

Since a standard normal distribution is continuous I believe it can theoretically extend indefinitely, however I am still not clear why a random variable within my normal distributions could take a negative value when this is not plausible. 

 

You shared that there are continuous distributions whose probability density is exactly zero for negative values, am I able to specify in my code that random negative values are not plausible?

PROC SGPLOT DATA=test; 
     HISTOGRAM score1 / FILLATTRS=(COLOR=LIGGR);
     DENSITY score1 / TYPE=normal LINEATTRS=(COLOR=black pattern=1) NAME="normal" LEGENDLABEL="normal curve 1";
     DENSITY score1 / TYPE=normal (MU=43 SIGMA=26) LINEATTRS=(COLOR=black pattern=2) NAME="standard" 
LEGENDLABEL="normal curve 2"; KEYLEGEND "normal" "standard" / LOCATION=inside POSITION=topright ACROSS=1; XAXIS LABEL='score' MIN=0 OFFSETMIN=0.05 MAX=120; RUN;

 

FreelanceReinh
Jade | Level 19

@Adele3 wrote:

I am still not clear why a random variable within my normal distributions could take a negative value when this is not plausible. 

Those normal distributions are only approximations. They will not match exactly the (unknown) distribution of your SCORE1 variable. The same is true for other theoretical distributions (including strictly positive distributions). This is a very common situation. Nobody will conclude that your scores might be negative sometimes or that the normal approximation is invalid because of that minor and unavoidable discrepancy. If, for example, rainfall data are approximated by a normal distribution, it goes without saying that nevertheless rainfall is always non-negative.

 

In your PROC SGPLOT step you can only try to avoid displaying negative tick marks (e.g., by using the MIN=0 option of the XAXIS statement, as you have done already, or with options of the HISTOGRAM statement). You cannot force the normal density curves through the origin, though, as this would contradict their definition.

Adele3
Fluorite | Level 6

Thank you @FreelanceReinh, you have been very helpful!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 401 views
  • 2 likes
  • 3 in conversation