BookmarkSubscribeRSS Feed
dhrumil_patel
Fluorite | Level 6

I would like to create a boxplot to compare the distributions between two values of a categorical variable. I would like to use PROC BOXPLOT to take advantage of the INSETGROUP statement and display univariate statistics for each value.

 

The code I'm using is:

 

PROC BOXPLOT DATA=&dsn;

BY &age_bucket;

PLOT mileage*fuel / MAXPANELS=100;

RUN;

 

I used the MAXPANELS= option because I initially got an error stating the number of panels needed is 97 and I can alter the default with the MAXPANELS= option. However, even after adding it, I get unexpected results. The categorical variable fuel only has two values, so I'm expected to boxplots on one panel with this code (for each age_bucket). 

 

Can someone explain why this is not happening and I'm getting instead dozens of boxplots across repeated values of the categorical variable fuel

 

Thanks in advance for any assistance. 

 

Dhrumil

6 REPLIES 6
ballardw
Super User

Show some example data.

And some of the output.

And what does the macro variable &age_bucket resolve to? I would normally expect a single variable that has values of age or the age group and do not see why a macro variable was needed. Unless your "age_bucket" is consisting fo multiple variables in which case that will go a long way to explaining many plots.

dhrumil_patel
Fluorite | Level 6

Yes. Thanks for the clarification.

 

Example data and output are attached. 

 

Sorry, I misstated in that age_bucket is not a macro variable, just a variable:

 


proc boxplot data=&dsn;
by age_bucket;
plot kms*fuel_vims / maxpanels=100;
run;


example_output.PNG
ballardw
Super User

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Note that the picture of your output really isn't possible with the code you've shown. As a minimum there may be a Title statment missing.

But with your "title" showing "FUEL=Diesel" and all of the horizontal axis values of "gasolina" I suspect there is something else you haven't shown.

Reeza
Super User

Run a proc freq on your FUEL variable with NO formats. I suspect you have an underlying format that may be causing the issue. 

 

proc freq data=have;
table fuel;
format fuel;
run;
ap15
Calcite | Level 5

I ran in the same issue. You have to sort the data first. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 5676 views
  • 0 likes
  • 4 in conversation