BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi Folks:

 

I'm trying to compare the distribution of survival time across A, B, C and D groupS of populations. However, x-axis range of my kernel density plot doesn't match (it even took negative value) to the range of underlying data. Proc means returns min=0 and max 4015. Am I missing to specify any options in the proc sgplot to match x-axis to the underlying data? How to tie my x-axis to the data?

surv_days.png

Any hints appreciated.

Thanks in advance.

kernel density.png

 

proc sgplot data = MYDATA;
title "SURVIVAL TIME DISTRIBUTION";
yaxis label = "Percent" offsetmax=0 values=(0 to 8 by 0.5);
xaxis label = "What is X axis?" grid;
styleattrs DATACONTRASTCOLORS=(BLUE green PURPLE red);
density DUR_GOLD / scale=percent type = kernel legendlabel = "A" ;
density DUR_RAND / scale=percent type = kernel legendlabel = "B" ;
density DUR_MID / scale=percent type = kernel legendlabel = "C" ;
density SURV_DAYS / scale=percent type = kernel legendlabel = "D" ;
keylegend / location=OUTSIDE position=top;
run;

proc means data=MYDATA maxdec=1 max min;
var A B C D;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
My guess is that Density statement doesn't do what you think it does, ie it assumes your data has a normal distribution and fits a distribution to it, it doesn't plot the a cdf or pdf type plot.

View solution in original post

10 REPLIES 10
ballardw
Super User

Density plots are based on an estimated distribution as requested and as such the distribution indicated by your data will often have values outside the range of your data. The display will tend to cut off at a point where no more useful information is displayed, i.e. all of the Y values are so close to 0 that you can't tell the difference.

 

A normal distribution always has a domain (x values) of plus or minus infinity. But typically when you get to X plus/minus 4 standard deviations there isn't much detectible difference in the Y values.

 

If you really don't want to see the curve past your limits provide a VALUES clause to the XAXIS statement.

 

 

Cruise
Ammonite | Level 13

tHANKS @ballardw 

Specifying limits results in following plot.

proc sgplot data = mydata; /*N=12,061*/
    title "SURVIVAL TIME DISTRIBUTION";
    yaxis label = "Percent" offsetmax=0 values=(0 to 8 by 0.5); 
	xaxis label = "Survival Time in Days" grid offsetmax=0 values=(0 to 4500 by 100); 
	styleattrs DATACONTRASTCOLORS=(BLUE green PURPLE red);
    density DUR_GOLD / scale=percent type = kernel legendlabel = "A" ; 
	density DUR_RAND / scale=percent type = kernel legendlabel = "B" ; 
	density DUR_MID / scale=percent type = kernel legendlabel = "C" ; 
	density SURV_DAYS / scale=percent type = kernel legendlabel = "D" ; 
	keylegend / location=OUTSIDE position=top;
run;  

KERNEL DENS CHANGED.png

Reeza
Super User
Question, why does the proc sgplot refer to different variables than proc means even though the same input data set? In that situation, I wouldn't expect them to match.
Cruise
Ammonite | Level 13
I was gonna conceal actual names of variables and use A, B,C,D instead. I have to change that in the post here
Cruise
Ammonite | Level 13

@Reeza 

real mean.png

 

proc means on actual variables

Reeza
Super User
My guess is that Density statement doesn't do what you think it does, ie it assumes your data has a normal distribution and fits a distribution to it, it doesn't plot the a cdf or pdf type plot.
Cruise
Ammonite | Level 13
I agree. x-axis still takes negative values even if i switch kernel to normal.
Reeza
Super User
Which makes sense to me. I think you want a histogram instead really, just a smaller granularity perhaps?
Cruise
Ammonite | Level 13
Yes, histogram, I think, will require long format data vs mine right now is organized wide where A,B,C AND D are separate variables. It takes me a while to reconfigure data for a Histogram using group option. Please let me know if there's a way to do it using wide format for histogram.
Reeza
Super User
Multiple histogram statements? 4 of them isn't too bad and probably just as much code as a PROC TRANSPOSE.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1937 views
  • 3 likes
  • 3 in conversation