Not applicable
Posts: 0

# Calculate mean and median depending on no. of observations

Hi,

I am pretty new in using SAS and could use some help with the following problem.

I have data similar to this:

Code | value
1111 | 1
1111 | 2
1111 | 3
1111 | 4
1111 | 5
1112 | 1
1113 | 1
1113 | 2
1113 | 3
1113 | 4
1121 | 1
1121 | 2
1121 | 3
1121 | 4
1121 | 5

What do I want to do?

I want to calculate the mean and median of "value", iff "code" occurs 5 or more times in the table.
So, for example, I want SAS to check if the code "1111" exists 5 or more times. If yes, SAS should create new variables "median" and "mean" giving me the median and mean of the values of "1111" in a new file. Accordingly, "mean" and "median" in this case should be equal to 3:

Code | Median | Mean
1111 | 3 | 3

Now, SAS should do the same for "1112" and realize that there is only 1 observation. In this case, SAS should return "." and instead use all observations with the first three digits to calculate mean and median, so for all observations of code starting with "111" (1111, 1112, ..., 1118, 1119).

Code | Median | Mean
1111 | 3 | 3
1112 | . | .
111 | 2.5 | 2.6

And so forth...

Can anybody help me? I am pretty clueless at the moment...

Thanks a lot!
Tucker
Not applicable
Posts: 0

## Re: Calculate mean and median depending on no. of observations

have a look at PROC MEANS

Let it create an output dataset

If the _FREQ_ of the CLASS code is less than 5 set the results to missing.
Something like[pre]PROC MEANS DATA= your.data1 NOPRINT ;
CLASS Code ;
VAR value ;
OUTPUT OUT= means_out MEAN= mean MEDIAN= median ;
RUN ;
* now apply your rule for low-frequency cases ;
DATA final_results ;
SET means_out ;
IF _FREQ_ < 5 THEN DO ;
mean = . ;
median = . ;
END ;
RUN ;

try that
PeterC
Not applicable
Posts: 0

## Re: Calculate mean and median depending on no. of observations

Hi Peter!

That works great, thank you!

May I ask two more questions?

1) How can I also calculate the geometric mean?

I tried:

DATA= your.data1 NOPRINT ;
CLASS Code ;
VAR value ;
OUTPUT OUT= means_out MEAN= mean MEDIAN= median GEOMEAN= geomean;
RUN ;

but SAS doesn't recognize GEOMEAN as an operator/function

2) What if I have Value1, Value2, Value3 and Value4? How can I make SAS to calculate also mean, median and geomean of Value 2 to 4?

I tried:

DATA= your.data1 NOPRINT ;
CLASS Code ;
VAR value1 value2 value3 value4;
OUTPUT OUT= means_out MEAN= mean MEDIAN= median ;
RUN ;

but work.means_out only contains two columns stating mean and median of value1 instead of 8 columns (2*4, which I would have expected).

Does anybody know an answer to these questions?

Thanks a lot!
Tucker Message was edited by: sas_rookie
Not applicable
Posts: 0

## Re: Calculate mean and median depending on no. of observations

Hi again!

In order to let ME also contribute something to the discussion I wanna answer some of my own questions:

@ Question 1):

As far as I am concerned SAS supports GEOMEAN as a function (http://support.sas.com/documentation/whatsnew/91x/lrdictwhatsnew900.htm) but apparently not as a part of PROC MEANS (which is disappointing).

As a solution I programmed the following workaround:

PROC MEANS DATA= your.data00 NOPRINT ;
CLASS Code ;
VAR Value;
OUTPUT OUT= means_out_median MEAN= mean MEDIAN= median ;
RUN ;

*now the workaround for the geomean;

PROC MEANS DATA= your.data01 NOPRINT maxdec=2 ;
CLASS Code ;
VAR logvalue;
OUTPUT OUT= means_out_log mean = MeanLog;
RUN ;
data means_out_log ;
set means_out_log ;
geomean=exp(MeanLog) ;
drop Meanlog ;
RUN;

* merge files
proc sort
data = means_out_median;
by Code;
Run;

proc sort
data = means_out_log;
by Code;
Run;

data means_out_all;
merge means_out_log means_out_median;
by Code;
RUN;

* filter the output which is based upon less than 5 observations ;

DATA your.data02 ;
SET means_out_all ;
IF _FREQ_ < 5 THEN DO ;
mean = . ;
median = . ;
geomean= . ;
END ;
Run ;

@ question 2):

No, you cannot do that for more than one variable (again, I was a bit disappointed). If you state more than one variable, SAS only calculates the mean for the first. So I repeated the same procedure for each variable - but maybe there's a more sophisticated solution!?!

Cheers,
Tucker
Not applicable
Posts: 0

## Re: Calculate mean and median depending on no. of observations

For #2 you goofed a little.

To get a default list of statistics you need to NOT specify the output name for the discriptive stats.
[pre]
var field1 field2 field3 field4;
output out=outdata min= max= mean= ;
[/pre]
to have mean values for only fields 2 and 4
[pre]
output out=outdata min= max= mean(field2 field4)= ;
[/pre]
There is also a "TIP:" in the documentation about the AUTONAME option.

Since you used "mean=mean" it only provided the "Mean" for the first field.
If you had used "mean=mean1 mean2 mean3 mean4" you would have gotten the mean values for field1 field2 field3.
This is also clearly stated in the SAS documentation.
Not applicable
Posts: 0