I have a table that has columns with states and measurements. I need to get the top 10 states by the measurement and combine the rest into "Other" category and have 11 levels in total. The code below gives me the top 10 states, and one way is to hard code with if/else to set the rest to "Other":
proc freq data=dsin ORDER=FREQ; tables State / maxlevels=10; weight Measurement; run;
However, I need to automate this process in case the top 10 states change. Thank you.
You still haven't shown what you want the results to look like (original dataset with a new variable named top10? The frequency output with 11 rows (the top10 plus pooled other? something else?).
So here's a way to add a top10 variables to the content of sashelp.shoes, where the classification var is subsidiary (n=53) and the weight variable is sales).
proc freq data=sashelp.shoes order=freq noprint;
tables subsidiary / out=sales_by_sub ;
weight sales;
run;
proc sql noprint;
select distinct quote(trim(subsidiary)) into :toptenlist separated by ','
from sales_by_sub (obs=10);
quit;
%put &=toptenlist;
data want;
set sashelp.shoes;
if subsidiary in ( &toptenlist ) then top10=subsidiary;
else top10='Other';
run;
This uses the ability of proc sql to take data from a data set and generate a macrovar (named toptenlist above) from data in that dataset. It also uses the dataset name parameter obs= to extact just the first 10 subsidiaries from the output of proc freq, making the proc sql job simple.
Define a bit more clearly how to determine "top 10 states".
Example data.
Expected output.
One way would to do what I think you are requesting is to create a data set from the proc freq output and then use that data set to create custom format that displays "other" for all the not-top-10. But how you expect to see the output would help.
The top states are determined using the freq procedure above based on the highest measurement. I would just need to attach a new column top10 that is equal to "Other" if not one of the 10 states, otherwise the state:
if States not in ('CA' 'NY' ...) then top10 = 'Other'; else top10 = States ;
I just have trouble extracting the values in the brackets without listing them manually for each new data.
You still haven't shown what you want the results to look like (original dataset with a new variable named top10? The frequency output with 11 rows (the top10 plus pooled other? something else?).
So here's a way to add a top10 variables to the content of sashelp.shoes, where the classification var is subsidiary (n=53) and the weight variable is sales).
proc freq data=sashelp.shoes order=freq noprint;
tables subsidiary / out=sales_by_sub ;
weight sales;
run;
proc sql noprint;
select distinct quote(trim(subsidiary)) into :toptenlist separated by ','
from sales_by_sub (obs=10);
quit;
%put &=toptenlist;
data want;
set sashelp.shoes;
if subsidiary in ( &toptenlist ) then top10=subsidiary;
else top10='Other';
run;
This uses the ability of proc sql to take data from a data set and generate a macrovar (named toptenlist above) from data in that dataset. It also uses the dataset name parameter obs= to extact just the first 10 subsidiaries from the output of proc freq, making the proc sql job simple.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: