I am trying to generate histograms with three properties and I can't figure out how to do this. It seems like each of proc univariate, sg plot, and gchart will give me two of the three properties, but none of them will give me all three.
I'd like to generate a histogram such that: 1) I can control the number of bins, 2) I can control which labels show up on the x-axis, 3) I can get a vertical reference line.
For example, I have data that ranges from 70 to 140 inclusive and I'd like:
1) Bins by twos (70, 72, 74, ..., 138, 140)
2) Labels by 10s (70, 80, ..., 130, 140) where it's important that 70 and 140 are displayed
3) A vertical reference line at 100.
SAS seems to insist on either letting me set the number of bins but then SAS determines the numbers that are displayed on the x-axis, or SAS will let me choose the labels but then either uses those for the bins or determines its own number of bins.
Suggestions? Thank you in advance.
Okay, binstart and values are the separate commands I was looking for. If one of them is missing, SAS is either using the command present for both roles or is making its own guess for the missing command. In this code:
proc sgplot data=Timing;
histogram Number_of_Items / BINSTART = 70 BINWIDTH=2 scale=count;
yaxis label='Frequency' min=0 max=1350;
xaxis label='Number of Items' min=70 max=140
VALUES=(70 80 90 100 110 120 130 140);
refline 100 /axis=x;
run;
BINSTART and BINWIDTH control the number of bins SAS uses in its calculations and therefore the number of bars that appear in the histogram. VALUES controls what is printed for labels on the x-axis. In my other code I thought VALUES controlled the number of bins and with the long list of numbers I had there, there wasn't room for the 140 to print. With this smaller list of values, 70 and 140 both print. BINSTART and VALUES work independently of each other. Finally, refline gives me my reference line. Then goptions or templates get used to adjust fonts and such issues.
Thank you Reeza for your help. You narrowed things down so that my google searches finally found the correct terms that then gave me the correct commands.
SGPlot should allow you to do this, you may want to pre-calculate your data though
Can you show your code for SG Plot that doesn't work?
Thank you for your response. For example, this code:
proc sgplot data=Timing;
histogram Number_of_Items / scale=count;
xaxis values=(70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
100 102 104 106 108 110 112 114 116 118
120 122 124 126 128 130 132 134 136 138
140);
yaxis label='Frequency' min=0 max=1350;
xaxis label='Number of Items' min=70 max=140;
run;
It will give me the bins I specified, but the labels on the x-axis are 80, 100, 120 and 140. This seems like a minor issue, and it is, but I need to exactly reproduce an existing report and so need the x-axis scale to read 70, 80, ..., 140.
It hadn't occurred to me to pre-calculate data. If I make my own frequency counts, will SAS cooperate better graphing those? Is there a term I can google or a link to code?
Thank you again.
I think the problem is simpler, you have 2 xaxis statements and the last one overwrites the first one. Combining them together produces the labels that you'd like.
Look at the refline statement to add a reference line.
proc sgplot data=Timing;
histogram Number_of_Items / scale=count;
yaxis label='Frequency' min=0 max=1350;
xaxis label='Number of Items' min=70 max=140 values=(70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
100 102 104 106 108 110 112 114 116 118
120 122 124 126 128 130 132 134 136 138
140);;
run;
That's a good theory, and probably part of the problem, but when I run your code the values shown on the x-axis change from 80, 100, 120, 140 in my old code to the values 70 through 138 inclusive, by 4s (70, 74, 78, ..., 134, 138). So it still refuses to let me say that I want the x-axis to have tick marks at 70, 80, ..., 130, 140.
It's not clear to me if there's an easy way to do this or if I have to resort to templates, which I'm just learning, or if there is some other method. It just seems like there would be independent and easy commands to specify one set of numbers for the bins and a different set for the values displayed on the x-axis.
Okay, binstart and values are the separate commands I was looking for. If one of them is missing, SAS is either using the command present for both roles or is making its own guess for the missing command. In this code:
proc sgplot data=Timing;
histogram Number_of_Items / BINSTART = 70 BINWIDTH=2 scale=count;
yaxis label='Frequency' min=0 max=1350;
xaxis label='Number of Items' min=70 max=140
VALUES=(70 80 90 100 110 120 130 140);
refline 100 /axis=x;
run;
BINSTART and BINWIDTH control the number of bins SAS uses in its calculations and therefore the number of bars that appear in the histogram. VALUES controls what is printed for labels on the x-axis. In my other code I thought VALUES controlled the number of bins and with the long list of numbers I had there, there wasn't room for the 140 to print. With this smaller list of values, 70 and 140 both print. BINSTART and VALUES work independently of each other. Finally, refline gives me my reference line. Then goptions or templates get used to adjust fonts and such issues.
Thank you Reeza for your help. You narrowed things down so that my google searches finally found the correct terms that then gave me the correct commands.
You might find this overview helpful: Choosing bins for histograms in SAS - The DO Loop
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.