BookmarkSubscribeRSS Feed
amissana21
Calcite | Level 5
Spoiler
Spoiler
 

This is the description of the variable names in the data : 

CRIM: per capita crime rate by town

. ZN: proportion of residential land zoned for lots over 25,000 sq.ft.

. INDUS: proportion of non-retail business acres per town

. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

. NOX: nitric oxides concentration (parts per 10 million)

. RM: average number of rooms per dwelling

. AGE: proportion of owner-occupied units built prior to 1940

. DIS: weighted distances to five Boston employment centers

. RAD: index of accessibility to radial highways

. TAX: full-value property-tax rate per $10,000

. PTRATIO: pupil-teacher ratio by town

. LSTAT: % lower status of the population

. MEDV: Median value of owner-occupied homes in $1000’s

 

I need to compute the  mean, median, standard deviation, minimum, maximum values of medv for each combination of chas and rad (e.g. mean of chas=0 and rad=1). Using proc

format and format (inside the procedure for obtaining the summary statistics), print

out strings for chas (‘Bounds river’ and ‘Otherwise’ rather than 1 and 0). Which

combination of two group variables has the highest mean?

5 REPLIES 5
sbxkoenk
SAS Super FREQ

Hello,

 

Is this a homework assignment?

You can do it with PROC FORMAT indeed.

 

If NOT PROC FORMAT required , then use below technique (dara-driven code generation):

proc freq data=sashelp.class;
 tables sex / missing list out=work.count_sex;
run;

proc freq data=sashelp.class;
 tables age / missing list out=work.count_age;
run;

PROC SQL noprint;
 create table work.all_pairwise_combinations as
 select sex , age
 from   work.count_sex(drop=count percent)
      , work.count_age(drop=count percent);
QUIT;

filename tt CATALOG 'work.t.t.source';

data _NULL_;
 set work.all_pairwise_combinations;
 file tt;
 SEXinsert=strip(sex);
 ageinsert=put(age,2.);
PUT "title 'sex=" SEXinsert +(-1) "  and age=" ageinsert +(-1) "';";
PUT "proc means data=sashelp.class MEAN nway;";
PUT "where sex='" SEXinsert +(-1) "' and age=" ageinsert +(-1)  ";";
PUT " VAR HEIGHT;";
PUT "run;";
PUT "title; footnote;";
run;

%INCLUDE tt / source2;
filename tt CLEAR;
/* end of program */

Good luck,

Koen

 
amissana21
Calcite | Level 5

Thank you Koen, but I am required to do it with proc format and format also while printing.

sbxkoenk
SAS Super FREQ

Hello,

 

Then you need PROC FORMAT and CLASS statement(s) in PROC MEANS or PROC SUMMARY.

 

Base SAS Procedures Guide
MEANS Procedure
Example 5: Using Multilabel Value Formats with Class Variables

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n13dhme6o4ut3en1u8e58r16b3zf.htm

 

Cheers,

Koen

ballardw
Super User

One generic approach would be Proc Summary will all of the "group variables" as class variables and your analysis variables as var.

Generic code as I don't feel like writing a data step:

 

Proc summary data=youdatasetname;
   class <put the names of the group variables here>;
   var <put names of analysis varibles here>;
   output out=somedatasetname   mean= median= std= min= max= /autoname;
run;

Proc summary will do an overall summary of all the data, by each value of each variable on the class statement, each value of each pair of

variables , each value of combinations of 3 or more variables and then each value of each of the variables. The output, using the Autoname option means that the output statistics are named with the statistic as a suffix to the VAR variable names.

 

There will be a variable in the output data set named _type_ that indicates exactly which combination of the Class variables is used for that summary. _type_= 0 will be the overall data, the largest value of _type_ will have ALL of the class variables represented.

 

Select the _type_ (s) you want and sort the mean variables in descending order.

 

Formats applied to CLASS variables will create groups and used and display the values of the format if added to the Proc Summary code.

 

"Highest mean" is not clearly defined as I can't tell which variables you intend to summarize but from the raw numbers shown the "largest mean" is likely going to be Tax. Unless you meant the "largest mean of each of the analysis variables", which is calculated with proc summary. You then select the result you want.

PaigeMiller
Diamond | Level 26

IMHO, the answer from @ballardw about using PROC SUMMARY hits all the necessary points, and PROC SUMMARY is what I would advise as well.

--
Paige Miller

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 878 views
  • 2 likes
  • 4 in conversation