BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
asasha
Obsidian | Level 7

I have a table that has columns with states and measurements. I need to get the top 10 states by the measurement and combine the rest into "Other" category and have 11 levels in total. The code below gives me the top 10 states, and one way is to hard code with if/else to set the rest to "Other":

proc freq data=dsin ORDER=FREQ;
  tables State / maxlevels=10;
  weight Measurement;
run;

However, I need to automate this process in case the top 10 states change. Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You still haven't shown what you want the results to look like (original dataset with a new variable named top10?  The frequency output with 11 rows (the top10 plus pooled other?   something else?).

 

So here's a way to add a top10 variables to the content of sashelp.shoes, where the classification var is subsidiary (n=53) and the weight variable is sales).

 

proc freq data=sashelp.shoes order=freq noprint;
  tables subsidiary / out=sales_by_sub ;
  weight sales;
run;

proc sql noprint;
  select distinct quote(trim(subsidiary)) into :toptenlist separated by ','
    from sales_by_sub (obs=10);
quit;
%put &=toptenlist;

data want;
  set sashelp.shoes;
  if subsidiary in ( &toptenlist ) then top10=subsidiary;
  else top10='Other';
run;

This uses the ability of proc sql to take data from a data set and generate a macrovar  (named toptenlist above) from data in that dataset.  It also uses the dataset name parameter obs= to extact just the first 10 subsidiaries from the output of proc freq, making the proc sql job simple.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
ballardw
Super User

Define a bit more clearly how to determine "top 10 states".

 

Example data.

 

Expected output.

 

One way would to do what I think you are requesting is to create a data set from the proc freq output and then use that data set to create  custom format that displays "other" for all the not-top-10. But how you expect to see the output would help.

asasha
Obsidian | Level 7

The top states are determined using the freq procedure above based on the highest measurement. I would just need to attach a new column top10 that is equal to "Other" if not one of the 10 states, otherwise the state:

if States not in ('CA' 'NY' ...) then top10 = 'Other';
else top10 = States ;

I just have trouble extracting the values in the brackets without listing them manually for each new data.

mkeintz
PROC Star

You still haven't shown what you want the results to look like (original dataset with a new variable named top10?  The frequency output with 11 rows (the top10 plus pooled other?   something else?).

 

So here's a way to add a top10 variables to the content of sashelp.shoes, where the classification var is subsidiary (n=53) and the weight variable is sales).

 

proc freq data=sashelp.shoes order=freq noprint;
  tables subsidiary / out=sales_by_sub ;
  weight sales;
run;

proc sql noprint;
  select distinct quote(trim(subsidiary)) into :toptenlist separated by ','
    from sales_by_sub (obs=10);
quit;
%put &=toptenlist;

data want;
  set sashelp.shoes;
  if subsidiary in ( &toptenlist ) then top10=subsidiary;
  else top10='Other';
run;

This uses the ability of proc sql to take data from a data set and generate a macrovar  (named toptenlist above) from data in that dataset.  It also uses the dataset name parameter obs= to extact just the first 10 subsidiaries from the output of proc freq, making the proc sql job simple.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 696 views
  • 0 likes
  • 3 in conversation