turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Calculate mean and median depending on no. of obse...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-22-2008 04:12 PM

Hi,

I am pretty new in using SAS and could use some help with the following problem.

I have data similar to this:

Code | value

1111 | 1

1111 | 2

1111 | 3

1111 | 4

1111 | 5

1112 | 1

1113 | 1

1113 | 2

1113 | 3

1113 | 4

1121 | 1

1121 | 2

1121 | 3

1121 | 4

1121 | 5

What do I want to do?

I want to calculate the mean and median of "value", iff "code" occurs 5 or more times in the table.

So, for example, I want SAS to check if the code "1111" exists 5 or more times. If yes, SAS should create new variables "median" and "mean" giving me the median and mean of the values of "1111" in a new file. Accordingly, "mean" and "median" in this case should be equal to 3:

Code | Median | Mean

1111 | 3 | 3

Now, SAS should do the same for "1112" and realize that there is only 1 observation. In this case, SAS should return "." and instead use all observations with the first three digits to calculate mean and median, so for all observations of code starting with "111" (1111, 1112, ..., 1118, 1119).

Code | Median | Mean

1111 | 3 | 3

1112 | . | .

111 | 2.5 | 2.6

And so forth...

Can anybody help me? I am pretty clueless at the moment...

Thanks a lot!

Tucker

I am pretty new in using SAS and could use some help with the following problem.

I have data similar to this:

Code | value

1111 | 1

1111 | 2

1111 | 3

1111 | 4

1111 | 5

1112 | 1

1113 | 1

1113 | 2

1113 | 3

1113 | 4

1121 | 1

1121 | 2

1121 | 3

1121 | 4

1121 | 5

What do I want to do?

I want to calculate the mean and median of "value", iff "code" occurs 5 or more times in the table.

So, for example, I want SAS to check if the code "1111" exists 5 or more times. If yes, SAS should create new variables "median" and "mean" giving me the median and mean of the values of "1111" in a new file. Accordingly, "mean" and "median" in this case should be equal to 3:

Code | Median | Mean

1111 | 3 | 3

Now, SAS should do the same for "1112" and realize that there is only 1 observation. In this case, SAS should return "." and instead use all observations with the first three digits to calculate mean and median, so for all observations of code starting with "111" (1111, 1112, ..., 1118, 1119).

Code | Median | Mean

1111 | 3 | 3

1112 | . | .

111 | 2.5 | 2.6

And so forth...

Can anybody help me? I am pretty clueless at the moment...

Thanks a lot!

Tucker

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

05-23-2008 07:38 AM

have a look at PROC MEANS

Let it create an output dataset

If the _FREQ_ of the CLASS code is less than 5 set the results to missing.

Something like[pre]PROC MEANS DATA= your.data1 NOPRINT ;

CLASS Code ;

VAR value ;

OUTPUT OUT= means_out MEAN= mean MEDIAN= median ;

RUN ;

* now apply your rule for low-frequency cases ;

DATA final_results ;

SET means_out ;

IF _FREQ_ < 5 THEN DO ;

mean = . ;

median = . ;

END ;

RUN ;

try that

PeterC

Let it create an output dataset

If the _FREQ_ of the CLASS code is less than 5 set the results to missing.

Something like[pre]PROC MEANS DATA= your.data1 NOPRINT ;

CLASS Code ;

VAR value ;

OUTPUT OUT= means_out MEAN= mean MEDIAN= median ;

RUN ;

* now apply your rule for low-frequency cases ;

DATA final_results ;

SET means_out ;

IF _FREQ_ < 5 THEN DO ;

mean = . ;

median = . ;

END ;

RUN ;

try that

PeterC

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

05-24-2008 12:36 PM

Hi Peter!

That works great, thank you!

May I ask two more questions?

1) How can I also calculate the geometric mean?

I tried:

DATA= your.data1 NOPRINT ;

CLASS Code ;

VAR value ;

OUTPUT OUT= means_out MEAN= mean MEDIAN= median GEOMEAN= geomean;

RUN ;

but SAS doesn't recognize GEOMEAN as an operator/function

2) What if I have Value1, Value2, Value3 and Value4? How can I make SAS to calculate also mean, median and geomean of Value 2 to 4?

I tried:

DATA= your.data1 NOPRINT ;

CLASS Code ;

VAR value1 value2 value3 value4;

OUTPUT OUT= means_out MEAN= mean MEDIAN= median ;

RUN ;

but work.means_out only contains two columns stating mean and median of value1 instead of 8 columns (2*4, which I would have expected).

Does anybody know an answer to these questions?

Thanks a lot!

Tucker Message was edited by: sas_rookie

That works great, thank you!

May I ask two more questions?

1) How can I also calculate the geometric mean?

I tried:

DATA= your.data1 NOPRINT ;

CLASS Code ;

VAR value ;

OUTPUT OUT= means_out MEAN= mean MEDIAN= median GEOMEAN= geomean;

RUN ;

but SAS doesn't recognize GEOMEAN as an operator/function

2) What if I have Value1, Value2, Value3 and Value4? How can I make SAS to calculate also mean, median and geomean of Value 2 to 4?

I tried:

DATA= your.data1 NOPRINT ;

CLASS Code ;

VAR value1 value2 value3 value4;

OUTPUT OUT= means_out MEAN= mean MEDIAN= median ;

RUN ;

but work.means_out only contains two columns stating mean and median of value1 instead of 8 columns (2*4, which I would have expected).

Does anybody know an answer to these questions?

Thanks a lot!

Tucker Message was edited by: sas_rookie

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

05-25-2008 02:07 PM

Hi again!

In order to let ME also contribute something to the discussion I wanna answer some of my own questions:

@ Question 1):

As far as I am concerned SAS supports GEOMEAN as a function (http://support.sas.com/documentation/whatsnew/91x/lrdictwhatsnew900.htm) but apparently not as a part of PROC MEANS (which is disappointing).

As a solution I programmed the following workaround:

PROC MEANS DATA= your.data00 NOPRINT ;

CLASS Code ;

VAR Value;

OUTPUT OUT= means_out_median MEAN= mean MEDIAN= median ;

RUN ;

*now the workaround for the geomean;

PROC MEANS DATA= your.data01 NOPRINT maxdec=2 ;

CLASS Code ;

VAR logvalue;

OUTPUT OUT= means_out_log mean = MeanLog;

RUN ;

data means_out_log ;

set means_out_log ;

geomean=exp(MeanLog) ;

drop Meanlog ;

RUN;

* merge files

proc sort

data = means_out_median;

by Code;

Run;

proc sort

data = means_out_log;

by Code;

Run;

data means_out_all;

merge means_out_log means_out_median;

by Code;

RUN;

* filter the output which is based upon less than 5 observations ;

DATA your.data02 ;

SET means_out_all ;

IF _FREQ_ < 5 THEN DO ;

mean = . ;

median = . ;

geomean= . ;

END ;

Run ;

@ question 2):

No, you cannot do that for more than one variable (again, I was a bit disappointed). If you state more than one variable, SAS only calculates the mean for the first. So I repeated the same procedure for each variable - but maybe there's a more sophisticated solution!?!

Cheers,

Tucker

In order to let ME also contribute something to the discussion I wanna answer some of my own questions:

@ Question 1):

As far as I am concerned SAS supports GEOMEAN as a function (http://support.sas.com/documentation/whatsnew/91x/lrdictwhatsnew900.htm) but apparently not as a part of PROC MEANS (which is disappointing).

As a solution I programmed the following workaround:

PROC MEANS DATA= your.data00 NOPRINT ;

CLASS Code ;

VAR Value;

OUTPUT OUT= means_out_median MEAN= mean MEDIAN= median ;

RUN ;

*now the workaround for the geomean;

PROC MEANS DATA= your.data01 NOPRINT maxdec=2 ;

CLASS Code ;

VAR logvalue;

OUTPUT OUT= means_out_log mean = MeanLog;

RUN ;

data means_out_log ;

set means_out_log ;

geomean=exp(MeanLog) ;

drop Meanlog ;

RUN;

* merge files

proc sort

data = means_out_median;

by Code;

Run;

proc sort

data = means_out_log;

by Code;

Run;

data means_out_all;

merge means_out_log means_out_median;

by Code;

RUN;

* filter the output which is based upon less than 5 observations ;

DATA your.data02 ;

SET means_out_all ;

IF _FREQ_ < 5 THEN DO ;

mean = . ;

median = . ;

geomean= . ;

END ;

Run ;

@ question 2):

No, you cannot do that for more than one variable (again, I was a bit disappointed). If you state more than one variable, SAS only calculates the mean for the first. So I repeated the same procedure for each variable - but maybe there's a more sophisticated solution!?!

Cheers,

Tucker

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

05-26-2008 12:14 PM

For #2 you goofed a little.

To get a default list of statistics you need to NOT specify the output name for the discriptive stats.

[pre]

var field1 field2 field3 field4;

output out=outdata min= max= mean= ;

[/pre]

to have mean values for only fields 2 and 4

[pre]

output out=outdata min= max= mean(field2 field4)= ;

[/pre]

There is also a "TIP:" in the documentation about the AUTONAME option.

Since you used "mean=mean" it only provided the "Mean" for the first field.

If you had used "mean=mean1 mean2 mean3 mean4" you would have gotten the mean values for field1 field2 field3.

This is also clearly stated in the SAS documentation.

To get a default list of statistics you need to NOT specify the output name for the discriptive stats.

[pre]

var field1 field2 field3 field4;

output out=outdata min= max= mean= ;

[/pre]

to have mean values for only fields 2 and 4

[pre]

output out=outdata min= max= mean(field2 field4)= ;

[/pre]

There is also a "TIP:" in the documentation about the AUTONAME option.

Since you used "mean=mean" it only provided the "Mean" for the first field.

If you had used "mean=mean1 mean2 mean3 mean4" you would have gotten the mean values for field1 field2 field3.

This is also clearly stated in the SAS documentation.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

05-27-2008 03:23 AM

for a general build of PROC MEANS output, you could do no better than have a look at the paper presented by Myra Olstik and someone-else at "the last SUGI" entitled "A Better Means — The ODS Data Trap. Myra A. Oltsik" www2.sas.com/proceedings/sugi31/059-31.pdf .

of course as it is designed to deliver any/all statistics implemented by PROC MEANS, it doesn't cope with GEOMEAN.... yet...

PeterC

of course as it is designed to deliver any/all statistics implemented by PROC MEANS, it doesn't cope with GEOMEAN.... yet...

PeterC