This is the description of the variable names in the data :
CRIM: per capita crime rate by town
. ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
. INDUS: proportion of non-retail business acres per town
. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
. NOX: nitric oxides concentration (parts per 10 million)
. RM: average number of rooms per dwelling
. AGE: proportion of owner-occupied units built prior to 1940
. DIS: weighted distances to five Boston employment centers
. RAD: index of accessibility to radial highways
. TAX: full-value property-tax rate per $10,000
. PTRATIO: pupil-teacher ratio by town
. LSTAT: % lower status of the population
. MEDV: Median value of owner-occupied homes in $1000’s
I need to compute the mean, median, standard deviation, minimum, maximum values of medv for each combination of chas and rad (e.g. mean of chas=0 and rad=1). Using proc
format and format (inside the procedure for obtaining the summary statistics), print
out strings for chas (‘Bounds river’ and ‘Otherwise’ rather than 1 and 0). Which
combination of two group variables has the highest mean?
Hello,
Is this a homework assignment?
You can do it with PROC FORMAT indeed.
If NOT PROC FORMAT required , then use below technique (dara-driven code generation):
proc freq data=sashelp.class;
tables sex / missing list out=work.count_sex;
run;
proc freq data=sashelp.class;
tables age / missing list out=work.count_age;
run;
PROC SQL noprint;
create table work.all_pairwise_combinations as
select sex , age
from work.count_sex(drop=count percent)
, work.count_age(drop=count percent);
QUIT;
filename tt CATALOG 'work.t.t.source';
data _NULL_;
set work.all_pairwise_combinations;
file tt;
SEXinsert=strip(sex);
ageinsert=put(age,2.);
PUT "title 'sex=" SEXinsert +(-1) " and age=" ageinsert +(-1) "';";
PUT "proc means data=sashelp.class MEAN nway;";
PUT "where sex='" SEXinsert +(-1) "' and age=" ageinsert +(-1) ";";
PUT " VAR HEIGHT;";
PUT "run;";
PUT "title; footnote;";
run;
%INCLUDE tt / source2;
filename tt CLEAR;
/* end of program */
Good luck,
Koen
Thank you Koen, but I am required to do it with proc format and format also while printing.
Hello,
Then you need PROC FORMAT and CLASS statement(s) in PROC MEANS or PROC SUMMARY.
Base SAS Procedures Guide
MEANS Procedure
Example 5: Using Multilabel Value Formats with Class Variables
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n13dhme6o4ut3en1u8e58r16b3zf.htm
Cheers,
Koen
One generic approach would be Proc Summary will all of the "group variables" as class variables and your analysis variables as var.
Generic code as I don't feel like writing a data step:
Proc summary data=youdatasetname; class <put the names of the group variables here>; var <put names of analysis varibles here>; output out=somedatasetname mean= median= std= min= max= /autoname; run;
Proc summary will do an overall summary of all the data, by each value of each variable on the class statement, each value of each pair of
variables , each value of combinations of 3 or more variables and then each value of each of the variables. The output, using the Autoname option means that the output statistics are named with the statistic as a suffix to the VAR variable names.
There will be a variable in the output data set named _type_ that indicates exactly which combination of the Class variables is used for that summary. _type_= 0 will be the overall data, the largest value of _type_ will have ALL of the class variables represented.
Select the _type_ (s) you want and sort the mean variables in descending order.
Formats applied to CLASS variables will create groups and used and display the values of the format if added to the Proc Summary code.
"Highest mean" is not clearly defined as I can't tell which variables you intend to summarize but from the raw numbers shown the "largest mean" is likely going to be Tax. Unless you meant the "largest mean of each of the analysis variables", which is calculated with proc summary. You then select the result you want.
IMHO, the answer from @ballardw about using PROC SUMMARY hits all the necessary points, and PROC SUMMARY is what I would advise as well.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.