I run monthly report based on a raw data. Due to the number of values (>100) available for the variable of interest, I would like to limit the display to the 10 most frequent values. The default HBAR graph captures the statitics (freq, cfreq, pct, cpct) I need to report. My question is how to subset it to just top 10 in descending order for the display. Is annotating data required? If not, is there any statement I can add to the procedure below?
proc gchart data=test;
HBAR variable-of-interest/
descending
autoref
clipref
raxis=axis1
coutline=black ;
run;
quit;
If you have the data set built like you have shown, you can use the following code to create the combined graph-table you want.
title 'Top N Defects';
proc sgplot data=defects;
hbar product / response=freq nostatlabel;
yaxistable product / position=left nostatlabel;
yaxistable freq cumfreq pct cumpct / position=right nostatlabel pad=5;
yaxis discreteorder=data display=(novalues noticks nolabel);
run;
I thought the AXIS statement might be able to help you but apparently not.
So you need a where statement (here we plot 5 bars):
proc sql noprint;
create table COUNTS as select count(*) as COUNT, AGE from SASHELP.CLASS group by AGE order by COUNT desc;
select AGE into :ages separated by ' ' from COUNTS(obs=5);
quit;
proc gchart data= SASHELP.CLASS;
hbar AGE/ descending
autoref
clipref
raxis =axis1
coutline=black ;
where AGE in ( &ages. );
run;
quit;
if you variable is a string you must quote it
select quote(catt(AGE)) into :ages separated by ' ' from COUNTS(obs=5);
Thanks for the response. I also attempted creating a horizontal graph with gchart using the subset data. The issue with subsetting the data for display was that the other statistics I need to display besides the simple frequency count (e.g. cumumlative frequency, percent, cumumulative percent) were lost.
I even tried to output the data using Proc Freq procedure but it (test_freq) only retained frequency and percent but not cumulative statistics.
proc freq data=test order=freq;
tables variable-of-interest/out=test_freq;
run;
That was the reason I tried to avoid subsetting the data and hoping there was a statement available to limit the graphic display instead. Please help.
You could do this using SGPLOT with SAS 9.4M3. First, I simulated your use case by creating a data set of 20 observations with random names and response values. I then sorted them by descending value, and added an "ID" variable of the sorted position. Then, I can use the VBAR (or HBAR) statement with CATEGORY=id and RESPONSE=value and set the x-axis TYPE=LINEAR. Now, I can display any set of bars I want by setting the numeric range on the x-axis. The x-axis tick values are suppressed and instead we use the xAxisTable to show the Names.
/*--Generate some data--*/
data bars;
drop i j;
length Name $6;
do i=1 to 20;
Value=2+10*ranuni(2);
do j=1 to 6;
substr(Name,j)=byte(int(65+26*ranuni(2)));
end;
output;
end;
run;
/*--Sort by descending value--*/
proc sort data=bars out=sorted;
by descending value;
run;
/*--Add sorted id number--*/
data bars2;
set sorted;
id=_n_;
run;
/*--Display first 8 bars--*/
proc sgplot data=bars2;
vbar id / response=value;
xaxis type=linear values=(1 to 20 by 1) valueshint min=1 max=8 display=(novalues nolabel);
xaxistable name / nolabel;
run;
/*--Display bars 3 - 10--*/
proc sgplot data=bars2;
vbar id / response=value;
xaxis type=linear values=(1 to 20 by 1) valueshint min=3 max=10 display=(novalues nolabel);
xaxistable name / nolabel;
run;
Could this be modified to show the statitics I need to report (cumulative frequency or percentage)? Thanks.
Yes, if you can compute the % and cumulative stats, you can plot these, showing only the top N bars as below. Not sure if this is what you want. Note, the x-axis is linear, so you can chop it off whereever you want.
Please see the horizontal graph I posted. Any program that can generate what I have described would be appreciated.
Maybe you can group the others together then, so they can be seen.
proc sql noprint;
create table COUNTS as select count(*) as COUNT, AGE from SASHELP.CLASS group by AGE order by COUNT desc;
select catt(AGE,'=',AGE) into :ages separated by ' ' from COUNTS(obs=5);
quit;
proc format ;
value topfive &ages. other ='Other';
proc gchart data= SASHELP.CLASS;
hbar AGE/ descending
autoref
clipref
raxis =axis1
coutline=black
discrete;
format AGE topfive.;
run;
quit;
I tried using the code and received an error message "The format TOPFIVE was not found or could not be loaded." Perhaps having numeric vs. categorical variable has to do with it?
I searched the SASHELP files and found "ORSALES" file. Here's the output from PROC FREQ by Product_Category:
SAS Output. I would appreciate walking me though the codes to display horizontal bar graph that displays all of the stats shown here but only displays product categories in this order: Clothes, Children Sports, Outdoors, Assorted Sports Articles, & Team Sports. Thanks!
Product_Category | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
Assorted Sports Articles | 64 | 7.0 | 64 | 7.0 |
Children Sports | 176 | 19.3 | 240 | 26.3 |
Clothes | 240 | 26.3 | 480 | 52.6 |
Golf | 32 | 3.5 | 512 | 56.1 |
Indoor Sports | 48 | 5.3 | 560 | 61.4 |
Outdoors | 112 | 12.3 | 672 | 73.7 |
Racket Sports | 48 | 5.3 | 720 | 78.9 |
Running - Jogging | 32 | 3.5 | 752 | 82.5 |
Shoes | 48 | 5.3 | 800 | 87.7 |
Swim Sports | 16 | 1.8 | 816 | 89.5 |
Team Sports | 64 | 7.0 | 880 | 96.5 |
Winter Sports | 32 | 3.5 | 912 | 100.0 |
I have no idea what this graph would look like: Some numbers in front of bars and then some numbers coming out of nowhere?
I don't know what graph you want.
Maybe you need two reports: one "top 10 graph" and one "detailed numbers table" ?
The following statements produces the graph above.
axis1 label=(f="Arial/Bold" "Count") minor=(n=1);
proc gchart data=sashelp.orsales;
HBAR Product_Category/
descending
autoref
clipref
raxis=axis1
coutline=black ;
run;
quit;
The graph contains everything I need: data visualization and all the stats (frequency, cum. frequency, percent, cum. percent) in one graph. It is exactly what I need to report except my variable that is equivalent to "Product Category" here has many more values than this test data set, as I mentioned >100.
This program serves the purpose for my report except I am unable to show only the top ten from my dataset without having to subset the data which seem to lose the statistics I need. For example, in the above graph, upto first five product categories (up to Team Sports) make up 71.93% of all the data in the dataset. The viewers can see the freqency indivually and cumulatively(704) per product category. If I subset the data, I end up with cumulative percent up to 100% for only five I selected which is misleading.
Currently, I do report in tables and graphs separately. I was looking for an option in SAS to combine the tabular data and graph in one.
I still don't make sense of your reply.
So you want the whole table on the right, but not all the bars on the left?
If so why not display the bars since you have an empty space?
Currently, I generate the Proc Freq output for the entire dataset and include it in the Appendix as a reference(like the table in the previous post). I also include in the body of the report the top 10 most frequent items with a visualization using a bar graph. This makes it easier for people to quickly glance and understand the data rather than looking at every single value available in the dataset. Since I'm using the sample file for illustration purpose, I'll continue with top 5. Below you see a table and a graph I will ordinarily produce (this is from the sashelp.orsales file) and include in the body of the report:
Product_Category |
Frequency |
Percent |
Cumulative |
Cumulative |
Clothes |
240 |
26.32 |
240 |
26.32 |
Children Sports |
176 |
19.30 |
416 |
45.61 |
Outdoors |
112 |
12.28 |
528 |
57.89 |
Assorted Sports Articles |
64 |
7.02 |
592 |
64.91 |
Team Sports |
64 |
7.02 |
656 |
71.93 |
I would like to combine these two outputs into one like below (Please note I just edited the gchart graph above since I cannot produce it by programming.) Does this help?
If you have the data set built like you have shown, you can use the following code to create the combined graph-table you want.
title 'Top N Defects';
proc sgplot data=defects;
hbar product / response=freq nostatlabel;
yaxistable product / position=left nostatlabel;
yaxistable freq cumfreq pct cumpct / position=right nostatlabel pad=5;
yaxis discreteorder=data display=(novalues noticks nolabel);
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.