Keep ONLY the highest-valued groups, for Box Plots

Reply
Regular Contributor
Posts: 212

Keep ONLY the highest-valued groups, for Box Plots

Presently I have so many "groups" when doing Box Plots that the result is SEVEN panels of box plots.

I'd like to have ONE panel, with about 20 box plots (or "groups").

So, that would require cutting out a bunch of groups.

Is there a way to automatically do this?

What I have in mind is:  In a data step, only keep the TOP 20 groups, using Q3 value for each group as the criterion for keeping or removing.

Any coding assistance greatly appreciated.

Nicholas Kormanik

Super User
Posts: 10,466

Re: Keep ONLY the highest-valued groups, for Box Plots

Calculating Q3 in a datastep is going to be a lot of work. Proc means / summary and merging with your existing data is probably a better bet. Why does it need to be in a datastep?

Regular Contributor
Posts: 212

Re: Keep ONLY the highest-valued groups, for Box Plots

I was thinking that it could be easily done in the data step, is all.

After reading up on the topic further, it now appears that perhaps the best answer is to do Proc BoxPlot with all groups, and include the option of creating OUTBOX or OUTHISTORY datafile.

Then in a subsequent run, use one of these new datafiles as the new input.

Still not sure, though, of the exact coding for keeping the top 20 groups.

See:

SAS/STAT(R) 9.2 User's Guide, Second Edition

proc boxplot history=Summary;

  plot (Weight Yieldstrength) * Batch;

  run;

Since another datafile is being used in this subsequent run (history), and that datafile contains a column for, say, Q3, one would use a new data statement to tailor the history data file -- sort on the Q3 column, descending order, and use OBS=20, so that only the top 20 'groups' will be included, and then used.  Result should be ONE panel of box plots of the top 20 groups.

Ask a Question
Discussion stats
  • 2 replies
  • 180 views
  • 0 likes
  • 2 in conversation