I want to get percentage of values above a threshold out of PROC MEANS. For example, get proportion of elderly from the Age variable.
In PROC FREQ, I can use variable formatting to accomplish the task:
proc format;
value Elderly
0-65 = 0
65 - high = 1;
run;
proc freq data=SASHelp.Heart;
table Age:;
format Age: Elderly.;
run;
But PROC MEANS does not respect the FORMAT or INFORMAT and performs calculations on unformatted data:
proc means data=SASHelp.Heart Mean;
format AgeAtStart AgeCHDdiag AgeAtDeath Elderly.;
informat AgeAtStart AgeCHDdiag AgeAtDeath Elderly.;
var Age: ;
output out=ODSet mean= / autoname;
run;
Is there an option in PROC MEANS or a trick to make it use formatted values of a variable like so many other SAS PROCs can do? I know I can accomplish the goal in PROC SQL by using mean(Age>65) but I would like to get the result out of PROC MEANS. Possible?
If you insist, the only way I see is:
data heart / view=heart;
set sashelp.heart;
age65AtStart = ageAtStart > 65;
age65CHDdiag = ageCHDdiag > 65;
age65AtDeath = ageAtDeath > 65;
run;
proc means data=Heart Mean;
var Age65: ; output out=ODSet mean= / autoname;run;
run;
@Haris wrote:
Now, can you do this in SAS without the interim DATA step?
Probably not if PROC MEANS is required.
If I understand your problem correctly, you are wanting to use the same variable to both categorise and analyse at the same time. The MEANS procedure requires that a variable can be in a CLASS statement or a VAR statement but not both at the same time.
This limitation is not present in PROC REPORT as you can have DEFINE statements to use the same variable as both a GROUP and as an ANALYSIS type.
No, not exactly, SASKiwi. What I want is for PROC MEANS to average the discrete values of Age 0 and 1 rather than actual age values. Say there are only two ages 60 and 70. PROC MEANS would return a mean of 65 for these two. What I want it to do is apply the format to these values, turn them into 0 and 1 (because 60 is less than 65 and 70 is more) and return the average of 0.50. If there are four ages: 40, 50, 60, and 70, they should be formatted as 0, 0, 0, and 1 and the mean I want to get is 0.25 rather than 55.
PROC SQL does what I need if I code it as:
proc sql;
select MEAN(AgeCHDdiag>65)
from SASHelp.Heart;
quit;
@Haris wrote:
Bummer Why would PROC MEANS be different from PROC FREQ? I wonder if PROC REPORT of PROC TABULATE would offer an option to use the formatted values?
Because MEANS and FREQ are different procedures, they are designed to do different things, so they operate differently (and in my opinion, these are very sensible reasonable and defendable differences, I think SAS got it right). Specifically, PROC MEANS works on continuous or integer-valued variables and performs arithmetic to compute statistics, while PROC FREQ works on variables that are categories (even continuous or integer-valued variables are assumed to contain categories in PROC FREQ) and does nothing more than counting and computing percents in a category. And so, for your problem, you ought to use the procedure that performs the task you want, instead of forcing it into a procedure that doesn't do what you want.
By the way, You can use formats in PROC MEANS, if the format is applied to the CLASS variables. You can't use formats in PROC MEANS on the VAR variable, except to change its appearance, instead of the default output of the variable in VAR statements. For example, you can have the VAR variable formatted as comma10.2, and this will work. But you can't format a VAR variable to achieve arithmetic manipulation via PROC MEANS.
PROC REPORT allows formatting of variables that are defined as GROUP or ACROSS variables. If you format a DISPLAY variable, it works the same as PROC MEANS, it does not allow arithmetic, it only changes the appearance. Also, in PROC REPORT, if you really want to do this in one step, I believe you can create the 0/1 variables in a COMPUTE block.
But really, what is the fuss about anyway? In one case you have to create FORMATS to do the arithmetic, in another case you have to do the arithmetic in a DATA step, why would one be allowed and the other disallowed? It's approximately the same amount of work, and gets to the same answers.
You are not atypical with this type of data summary. You should be able to create formats and then do the calculations in PROC FREQ without adding to the size of the data set.
However, as discussed, you can't do this in PROC MEANS. If you really need percentages above a cutoff, then PROC FREQ is the way to go. PROC FREQ (and PROC MEANS too) allow you to perform the operations in the data base, so you don't have to do an extract. https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=statug&docsetTarget=statug_fr...
This has the advantage of faster processing time, and less data being transferred from data base to your servers.
The simple answer is NO.
PROC MEANS is for getting statistics on numeric variables. Formatted values are by definition character strings. PROC FREQ is for counting numbers within categories and whether the categories are defined by numeric values or character values doesn't matter.
You could use your format to create a new numeric variable.
If you are worried about processing time try creating a view instead of a physical copy of the data.
data for_analysis / view=for_analysis;
set sashelp.heart;
array age age: ;
do over age;
age=input(put(age,elderly.),32.);
end;
run;
proc means data=for_analysis;
var age:;
run;
The MEANS Procedure Variable Label N Mean Std Dev Minimum Maximum ----------------------------------------------------------------------------- AgeCHDdiag Age CHD Diagnosed 1449 0.3968254 0.4894081 0 1.0000000 AgeAtStart Age at Start 5209 0 0 0 0 AgeAtDeath Age at Death 1991 0.6865897 0.4639962 0 1.0000000 -----------------------------------------------------------------------------
@Haris wrote:
I want to get percentage of values above a threshold out of PROC MEANS. For example, get proportion of elderly from the Age variable.
In PROC FREQ, I can use variable formatting to accomplish the task:
proc format;
value Elderly
0-65 = 0
65 - high = 1;
run; proc freq data=SASHelp.Heart; table Age:; format Age: Elderly.; run;
But PROC MEANS does not respect the FORMAT or INFORMAT and performs calculations on unformatted data:
proc means data=SASHelp.Heart Mean; format AgeAtStart AgeCHDdiag AgeAtDeath Elderly.; informat AgeAtStart AgeCHDdiag AgeAtDeath Elderly.; var Age: ;
output out=ODSet mean= / autoname;
run;
Is there an option in PROC MEANS or a trick to make it use formatted values of a variable like so many other SAS PROCs can do? I know I can accomplish the goal in PROC SQL by using mean(Age>65) but I would like to get the result out of PROC MEANS. Possible?
Perhaps this will demonstrate that Proc Means does use formats exactly the same way that Proc Freq does: creates groups based on the formatted value of a variable.
What the procedures do with those groups is different because of the purposes of the procedures.
proc means data=SASHelp.Heart; class AgeCHDdiag; format AgeCHDdiag Elderly.; var height weight; run;
It may help to share what you would expect for the output. Possibly work through some steps.
You may need to create new variables. Or take two steps such as run the data through proc freq creating output and then proc means on the percent with the count used as a freq variable? I am not quite sure exactly what you want for output so stepping through things may clear it up.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.