BookmarkSubscribeRSS Feed
Haris
Lapis Lazuli | Level 10

I want to get percentage of values above a threshold out of PROC MEANS.  For example, get proportion of elderly from the Age variable.

 

In PROC FREQ, I can use variable formatting to accomplish the task:

 

proc format; 
value Elderly
0-65 = 0
65 - high = 1;
run; proc freq data=SASHelp.Heart; table Age:; format Age: Elderly.; run;

 

 

But PROC MEANS does not respect the FORMAT or INFORMAT and performs calculations on unformatted data:

 

proc means data=SASHelp.Heart Mean;
  format   AgeAtStart AgeCHDdiag AgeAtDeath Elderly.;
  informat AgeAtStart AgeCHDdiag AgeAtDeath Elderly.;
  var Age: ;
output out=ODSet mean= / autoname;
run;

 

 

Is there an option in PROC MEANS or a trick to make it use formatted values of a variable like so many other SAS PROCs can do?  I know I can accomplish the goal in PROC SQL by using mean(Age>65) but I would like to get the result out of PROC MEANS.  Possible?

15 REPLIES 15
PGStats
Opal | Level 21

If you insist, the only way I see is:

 

data heart / view=heart;
set sashelp.heart;
age65AtStart = ageAtStart > 65;
age65CHDdiag = ageCHDdiag > 65;
age65AtDeath = ageAtDeath > 65;
run;

proc means data=Heart Mean;
  var Age65: ;  output out=ODSet mean= / autoname;run;
run;
PG
Haris
Lapis Lazuli | Level 10
Thanks for an additional way to accomplish the task. Now, can you do this in SAS without the interim DATA step?
PaigeMiller
Diamond | Level 26

@Haris wrote:
Now, can you do this in SAS without the interim DATA step?

Probably not if PROC MEANS is required.

 

 

--
Paige Miller
Haris
Lapis Lazuli | Level 10
Bummer 😞 Why would PROC MEANS be different from PROC FREQ? I wonder if PROC REPORT of PROC TABULATE would offer an option to use the formatted values?
SASKiwi
PROC Star

If I understand your problem correctly, you are wanting to use the same variable to both categorise and analyse at the same time. The MEANS procedure requires that a variable can be in a CLASS statement or a VAR statement but not both at the same time.

 

This limitation is not present in PROC REPORT as you can have DEFINE statements to use the same variable as both a GROUP and as an ANALYSIS type.

Haris
Lapis Lazuli | Level 10

No, not exactly, SASKiwi. What I want is for PROC MEANS to average the discrete values of Age 0 and 1 rather than actual age values. Say there are only two ages 60 and 70. PROC MEANS would return a mean of 65 for these two. What I want it to do is apply the format to these values, turn them into 0 and 1 (because 60 is less than 65 and 70 is more) and return the average of 0.50.  If there are four ages: 40, 50, 60, and 70, they should be formatted as 0, 0, 0, and 1 and the mean I want to get is 0.25 rather than 55.

PROC SQL does what I need if I code it as:

proc sql;
  select MEAN(AgeCHDdiag>65)
  from SASHelp.Heart;
quit;
PaigeMiller
Diamond | Level 26

@Haris wrote:
Bummer  Why would PROC MEANS be different from PROC FREQ? I wonder if PROC REPORT of PROC TABULATE would offer an option to use the formatted values?

Because MEANS and FREQ are different procedures, they are designed to do different things, so they operate differently (and in my opinion, these are very sensible reasonable and defendable differences, I think SAS got it right). Specifically, PROC MEANS works on continuous or integer-valued variables and performs arithmetic to compute statistics, while PROC FREQ works on variables that are categories (even continuous or integer-valued variables are assumed to contain categories in PROC FREQ) and does nothing more than counting and computing percents in a category. And so, for your problem, you ought to use the procedure that performs the task you want, instead of forcing it into a procedure that doesn't do what you want.

 

By the way, You can use formats in PROC MEANS, if the format is applied to the CLASS variables. You can't use formats in PROC MEANS on the VAR variable, except to change its appearance, instead of the default output of the variable in VAR statements. For example, you can have the VAR variable formatted as comma10.2, and this will work. But you can't format a VAR variable to achieve arithmetic manipulation via PROC MEANS.

 

PROC REPORT allows formatting of variables that are defined as GROUP or ACROSS variables. If you format a DISPLAY variable, it works the same as PROC MEANS, it does not allow arithmetic, it only changes the appearance. Also, in PROC REPORT, if you really want to do this in one step, I believe you can create the 0/1 variables in a COMPUTE block.

 

But really, what is the fuss about anyway? In one case you have to create FORMATS to do the arithmetic, in another case you have to do the arithmetic in a DATA step, why would one be allowed and the other disallowed? It's approximately the same amount of work, and gets to the same answers.

--
Paige Miller
Haris
Lapis Lazuli | Level 10
Thanks Paige. There are two concerns that brought me to seek formats in PROC MEANS. The datasets I am working with are very large. Adding computed variables substantially increases the dataset size. Second, the data resides on a server to which we have relatively slow access. Moving data from the source to the locally installed SAS can take time. I suppose there is also a third consideration. Our workflow is largely based on using PROC MEANS to create summaries by hospital. I can get N, nMiss, Mean, Median, Variance, etc. but not proportion of values above a threshold. Maybe I am atypical in the kind of data summary I seek, but it seems to me that proportion of values in a variable is a common descriptive statistic people seek. I think it is reasonable to desire that a function to extract a proportion of value/values was available in PROC MEANS. It would be nice to get all the summary statistics from one PROC (incidentally, the predecessor of PROC MEANS was PROC SUMMARY). Just makes things more compact, standard, and clean.
PaigeMiller
Diamond | Level 26

You are not atypical with this type of data summary. You should be able to create formats and then do the calculations in PROC FREQ without adding to the size of the data set. 

 

However, as discussed, you can't do this in PROC MEANS. If you really need percentages above a cutoff, then PROC FREQ is the way to go. PROC FREQ (and PROC MEANS too) allow you to perform the operations in the data base, so you don't have to do an extract. https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=statug&docsetTarget=statug_fr...

 

This has the advantage of faster processing time, and less data being transferred from data base to your servers.

--
Paige Miller
Haris
Lapis Lazuli | Level 10
Right, I know I can make the server do the calculations. Unfortunately, I cannot modify the datasets stored on the server. Views take an unacceptably long time to create for practical use. Adding variables such as Xgt1 = X>1 to a dataset for the purposes of calculating percentages means I need to create new interim temporary datasets ☹. Furthermore, PROC FREQ is not very friendly when it comes to creating OUTPUT datasets. My workflow relies on saving formatted output in standard tables. PROC MEANS creates beautiful OUTPUT datasets. PROC FREQ does not ☹.
I hope you can appreciate the want…
Tom
Super User Tom
Super User

The simple answer is NO.

PROC MEANS is for getting statistics on numeric variables. Formatted values are by definition character strings. PROC FREQ is for counting numbers within categories and whether the categories are defined by numeric values or character values doesn't matter.

 

You could use your format to create a new numeric variable. 

If you are worried about processing time try creating a view instead of a physical copy of the data.

 

data for_analysis / view=for_analysis;
  set sashelp.heart;
  array age age: ;
  do over age;
     age=input(put(age,elderly.),32.);
  end;
run;

proc means data=for_analysis;
  var age:;
run;
The MEANS Procedure

Variable     Label                 N      Mean    Std Dev  Minimum    Maximum
-----------------------------------------------------------------------------
AgeCHDdiag   Age CHD Diagnosed  1449 0.3968254  0.4894081        0  1.0000000
AgeAtStart   Age at Start       5209         0          0        0          0
AgeAtDeath   Age at Death       1991 0.6865897  0.4639962        0  1.0000000
-----------------------------------------------------------------------------

 

ballardw
Super User

@Haris wrote:

I want to get percentage of values above a threshold out of PROC MEANS.  For example, get proportion of elderly from the Age variable.

 

In PROC FREQ, I can use variable formatting to accomplish the task:

 

proc format; 
value Elderly
0-65 = 0
65 - high = 1;
run; proc freq data=SASHelp.Heart; table Age:; format Age: Elderly.; run;

 

 

But PROC MEANS does not respect the FORMAT or INFORMAT and performs calculations on unformatted data:

 

proc means data=SASHelp.Heart Mean;
  format   AgeAtStart AgeCHDdiag AgeAtDeath Elderly.;
  informat AgeAtStart AgeCHDdiag AgeAtDeath Elderly.;
  var Age: ;
output out=ODSet mean= / autoname;
run;

 

 

Is there an option in PROC MEANS or a trick to make it use formatted values of a variable like so many other SAS PROCs can do?  I know I can accomplish the goal in PROC SQL by using mean(Age>65) but I would like to get the result out of PROC MEANS.  Possible?


Perhaps this will demonstrate that Proc Means does use formats exactly the same way that Proc Freq does: creates groups based on the formatted value of a variable.

What the procedures do with those groups is different because of the purposes of the procedures.

proc means data=SASHelp.Heart;
  class AgeCHDdiag;
  format AgeCHDdiag Elderly.;
  var height weight;
run;
Haris
Lapis Lazuli | Level 10
Yes, I suppose you can look at it this way--both procs use CLASS consistently. It does help my cause, though, does it 😞
ballardw
Super User

It may help to share what you would expect for the output. Possibly work through some steps.

You may need to create new variables. Or take two steps such as run the data through proc freq creating output and then proc means on the percent with the count used as a freq variable? I am not quite sure exactly what you want for output so stepping through things may clear it up.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 11478 views
  • 2 likes
  • 6 in conversation