BookmarkSubscribeRSS Feed
Keegan
Obsidian | Level 7

I'm having trouble creating the 'Other' category for smaller groups in my data for a stacked bar. 

 

My data is grouped by a year/month variable then by a product variable:

Keegan_0-1635347731284.png

 

I've tried this code to group:

 %let topN=5;

data other;
	set term_sum;
	label topcat='Product Group';
	topcat=prod_group;
	if _n_ > &topn then topcat='Other'; 
run;

However, this code only takes into account the prod_group variable, and not the yy_mm variable. And the grouping looks like this:

  

Keegan_1-1635347931642.png

 

It's only looking at the prod_group and the top 5 and ignoring the parent group of yy_mm. 

Something like this doesn't work:

data other;
	set term_out;
	label topcat='Product Group';
	topcat=yy_mm*prod_group;
	if _n_ > &topn then topcat='Other'; 
run;

Are there suggestions to group together within my year/month grouping?

 

Then I could use the sgplot to plot the results:

proc sgplot data=term_out;
  title 'Test Chart';
  vbar yy_mm / response=count group=prod_group dataskin=gloss;
  xaxis display=(nolabel);
  yaxis grid;
  run;

It needs to be dynamic, since this will be automated and the groups that are the top 5 could change. 

Also, you'll notice there is a grouping of 'Other' already, but that just happened to be the name of the group. 

 

 

2 REPLIES 2
ballardw
Super User

Please provide a concise definition of what "Other" should be. If the prod_group VALUES that are to be included change from month to month this not going to be a bit more of an exercise. Such as you have to do whatever analysis by the "dates".

 

Please provide example data in the form of data step code that includes at least two "months". BTW, it appears from the picture that your  YY_mm variable is character which is very often sub-optimal because there are many tools to work with actual dates that make things much more flexible (or dynamic)

 

This code sets all values to 'Other' after line 5. This is because the variable _n_ counts iterations of the data step, which for many means the row of the data. If you meant to use the value of Count that you show in the PICTURE then perhaps this would work.

data other;
	set term_sum;
	label topcat='Product Group';
	topcat=prod_group;
	if count > &topn then topcat='Other'; 
run;

Of course this doesn't work. You are attempting to multiply two CHARACTER values. You can COMBINE two character values with a concatenation. Plus you are still using the _n_ wh

It's only looking at the prod_group and the top 5 and ignoring the parent group of yy_mm. 

Something like this doesn't work:

data other;
	set term_out;
	label topcat='Product Group';
	topcat=yy_mm*prod_group;
	if _n_ > &topn then topcat='Other'; 
run;

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

 

Keegan
Obsidian | Level 7

Thanks for responding!

 

Though I just solved the issue by simply using a proc rank and redefining the variables based on that. It adds another step, but it's easy to understand for me. 

 

Thanks!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 540 views
  • 0 likes
  • 2 in conversation