Data visualization with SAS programming

Limiting display of GCHART to top 10 rows in HBAR

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Limiting display of GCHART to top 10 rows in HBAR

I run monthly report based on a raw data.  Due to the number of values (>100) available for the variable of interest, I would like to limit the display to the 10 most frequent values.  The default HBAR graph captures the statitics (freq, cfreq, pct, cpct) I need to report.  My question is how to subset it to just top 10 in descending order for the display.  Is annotating data required?   If not, is there any statement I can add to the procedure below?

 

proc gchart data=test;
    HBAR variable-of-interest/
    descending
    autoref
    clipref
    raxis=axis1     
    coutline=black ;
run;

quit;


Accepted Solutions
Solution
a week ago
SAS Super FREQ
Posts: 1,108

Re: Limiting display of GCHART to top 10 rows in HBAR

[ Edited ]

If you have the data set built like you have shown, you can use the following code to create the combined graph-table you want.

 

 

First_N.png

 

title 'Top N Defects';
proc sgplot data=defects;
  hbar product / response=freq nostatlabel;
  yaxistable product / position=left nostatlabel;
  yaxistable freq cumfreq pct cumpct / position=right nostatlabel pad=5;
  yaxis discreteorder=data display=(novalues noticks nolabel);
run;

View solution in original post


All Replies
PROC Star
Posts: 1,564

Re: Limiting display of GCHART to top 10 rows in HBAR

PROC Star
Posts: 1,564

Re: Limiting display of GCHART to top 10 rows in HBAR

I thought the AXIS statement might be able to help you but apparently not.

So you need a where statement (here we plot 5 bars):

proc sql noprint;
  create table COUNTS as select count(*) as COUNT, AGE from SASHELP.CLASS group by AGE order by COUNT desc;
  select AGE into :ages separated by ' ' from COUNTS(obs=5);
quit;
proc gchart data= SASHELP.CLASS;
  hbar AGE/ descending
            autoref
            clipref
            raxis   =axis1     
            coutline=black ;
  where AGE in ( &ages. );
  run;
quit;

if you variable is a string you must quote it

 select quote(catt(AGE)) into :ages separated by ' ' from COUNTS(obs=5);

Capture.PNG

 

Occasional Contributor
Posts: 16

Re: Limiting display of GCHART to top 10 rows in HBAR

Thanks for the response.  I also attempted creating a horizontal graph with gchart using the subset data.  The issue with subsetting the data for display was that the other statistics I need to display besides the simple frequency count (e.g. cumumlative frequency, percent, cumumulative percent) were lost.

 

I even tried to output the data using Proc Freq procedure but it (test_freq) only retained frequency and percent but not cumulative statistics.

proc freq data=test order=freq;
    tables variable-of-interest/out=test_freq;
run;

 

That was the reason I tried to avoid subsetting the data and hoping there was a statement available to limit the graphic display instead.  Please help.

SAS Super FREQ
Posts: 1,108

Re: Limiting display of GCHART to top 10 rows in HBAR

You could do this using SGPLOT with SAS 9.4M3.  First, I simulated your use case by creating a data set of 20 observations with random names and response values.  I then sorted them by descending value, and added an "ID" variable of the sorted position.  Then, I can use the VBAR (or HBAR) statement with CATEGORY=id and RESPONSE=value and set the x-axis TYPE=LINEAR.  Now, I can display any set of bars I want by setting the numeric range on the x-axis.  The x-axis tick values are suppressed and instead we use the xAxisTable to show the Names.

 

SGPlot27.png

 

/*--Generate some data--*/
data bars;
  drop i j;
  length Name $6;
  do i=1 to 20;
    Value=2+10*ranuni(2);
	do j=1 to 6;
	  substr(Name,j)=byte(int(65+26*ranuni(2)));
	end;
    output;
  end;
run;

/*--Sort by descending value--*/
proc sort data=bars out=sorted;
  by descending value;
run;

/*--Add sorted id number--*/
data bars2;
  set sorted;
  id=_n_;
run;

/*--Display first 8 bars--*/
proc sgplot data=bars2;
  vbar id / response=value;
  xaxis type=linear values=(1 to 20 by 1) valueshint min=1 max=8 display=(novalues nolabel);
  xaxistable name / nolabel;
run;

/*--Display bars 3 - 10--*/
proc sgplot data=bars2;
  vbar id / response=value;
  xaxis type=linear values=(1 to 20 by 1) valueshint min=3 max=10 display=(novalues nolabel);
  xaxistable name / nolabel;
run;

 

 

Occasional Contributor
Posts: 16

Re: Limiting display of GCHART to top 10 rows in HBAR

Could this be modified to show the statitics I need to report (cumulative frequency or percentage)?  Thanks.

SAS Super FREQ
Posts: 1,108

Re: Limiting display of GCHART to top 10 rows in HBAR

Yes, if you can compute the % and cumulative stats, you can plot these, showing only the top N bars as below.  Not sure if this is what you want.  Note, the x-axis is linear, so you can chop it off whereever you want.

 

Pareto_Label.png

 

 

 

Occasional Contributor
Posts: 16

Re: Limiting display of GCHART to top 10 rows in HBAR

Please see the horizontal graph I posted. Any program that can generate what I have described would be appreciated.

PROC Star
Posts: 1,564

Re: Limiting display of GCHART to top 10 rows in HBAR

Maybe you can group the others together then, so they can be seen.

proc sql noprint;
  create table COUNTS as select count(*) as COUNT, AGE from SASHELP.CLASS group by AGE order by COUNT desc;
  select catt(AGE,'=',AGE) into :ages separated by ' ' from COUNTS(obs=5);
quit;
proc format ;
   value topfive &ages. other ='Other';
proc gchart data= SASHELP.CLASS;
  hbar AGE/ descending
            autoref
            clipref
            raxis   =axis1     
            coutline=black 
            discrete;
  format AGE topfive.;
  run;
quit;

SASGraph.jpg

Occasional Contributor
Posts: 16

Re: Limiting display of GCHART to top 10 rows in HBAR

I tried using the code and received an error message "The format TOPFIVE was not found or could not be loaded." Perhaps having numeric vs. categorical variable has to do with it?

 

I searched the SASHELP files and found "ORSALES" file.  Here's the output from PROC FREQ by Product_Category:

SAS Output.  I would appreciate walking me though the codes to display horizontal bar graph that displays all of the stats shown here but only displays product categories in this order: Clothes, Children Sports, Outdoors, Assorted Sports Articles, & Team Sports. Thanks!

 

Product_Category Frequency Percent Cumulative
Frequency
Cumulative
Percent
Assorted Sports Articles 64 7.0 64 7.0
Children Sports 176 19.3 240 26.3
Clothes 240 26.3 480 52.6
Golf 32 3.5 512 56.1
Indoor Sports 48 5.3 560 61.4
Outdoors 112 12.3 672 73.7
Racket Sports 48 5.3 720 78.9
Running - Jogging 32 3.5 752 82.5
Shoes 48 5.3 800 87.7
Swim Sports 16 1.8 816 89.5
Team Sports 64 7.0 880 96.5
Winter Sports 32 3.5 912 100.0
         
PROC Star
Posts: 1,564

Re: Limiting display of GCHART to top 10 rows in HBAR

I have no idea what this graph would look like: Some numbers in front of bars and then some numbers coming out of nowhere?

 

I don't know what graph you want.

 

Maybe you need two reports: one "top 10 graph" and one "detailed numbers table" ?

Occasional Contributor
Posts: 16

Re: Limiting display of GCHART to top 10 rows in HBAR

test.png

 

The following statements produces the graph above. 

 

axis1 label=(f="Arial/Bold" "Count") minor=(n=1);
proc gchart data=sashelp.orsales;
    HBAR Product_Category/
    descending
    autoref
    clipref
    raxis=axis1     
    coutline=black ;
run;
quit;

 

The graph contains everything I need: data visualization and all the stats (frequency, cum. frequency, percent, cum. percent) in one graph.  It is exactly what I need to report except my variable that is equivalent to "Product Category" here has many more values than this test data set, as I mentioned >100.

 

This program serves the purpose for my report except I am unable to show only the top ten from my dataset without having to subset the data which seem to lose the statistics I need.    For example, in the above graph, upto first five product categories (up to Team Sports) make up 71.93% of all the data in the dataset. The viewers can see the freqency indivually and cumulatively(704) per product category. If I subset the data, I end up with cumulative percent up to 100% for only five I selected which is misleading.

 

Currently, I do report in tables and graphs separately.  I was looking for an option in SAS to combine the tabular data and graph in one.

PROC Star
Posts: 1,564

Re: Limiting display of GCHART to top 10 rows in HBAR

I still don't make sense of your reply.

So you want the whole table on the right, but not all the bars on the left?

If so why not display the bars since you have an empty space?

 

Capture.PNG

Occasional Contributor
Posts: 16

Re: Limiting display of GCHART to top 10 rows in HBAR

Currently, I generate the Proc Freq output for the entire dataset and include it in the Appendix as a reference(like the table in the previous post). I also include in the body of the report the top 10 most frequent items with a visualization using a bar graph.  This makes it easier for people to quickly glance and understand the data rather than looking at every single value available in the dataset.    Since I'm using the sample file for illustration purpose, I'll continue with top 5. Below you see a table and a graph I will ordinarily produce (this is from the sashelp.orsales file) and include in the body of the report:

 

Product_Category

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

Clothes

240

26.32

240

26.32

Children Sports

176

19.30

416

45.61

Outdoors

112

12.28

528

57.89

Assorted Sports Articles

64

7.02

592

64.91

Team Sports

64

7.02

656

71.93

 

test2.png

 

 

I would like to combine these two outputs into one like below (Please note I just edited the gchart graph above since I cannot produce it by programming.)  Does this help?

test3.png

 

 

 

Solution
a week ago
SAS Super FREQ
Posts: 1,108

Re: Limiting display of GCHART to top 10 rows in HBAR

[ Edited ]

If you have the data set built like you have shown, you can use the following code to create the combined graph-table you want.

 

 

First_N.png

 

title 'Top N Defects';
proc sgplot data=defects;
  hbar product / response=freq nostatlabel;
  yaxistable product / position=left nostatlabel;
  yaxistable freq cumfreq pct cumpct / position=right nostatlabel pad=5;
  yaxis discreteorder=data display=(novalues noticks nolabel);
run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 261 views
  • 0 likes
  • 3 in conversation