BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

I am pretty new in using SAS and could use some help with the following problem.

I have data similar to this:

Code | value
1111 | 1
1111 | 2
1111 | 3
1111 | 4
1111 | 5
1112 | 1
1113 | 1
1113 | 2
1113 | 3
1113 | 4
1121 | 1
1121 | 2
1121 | 3
1121 | 4
1121 | 5

What do I want to do?

I want to calculate the mean and median of "value", iff "code" occurs 5 or more times in the table.
So, for example, I want SAS to check if the code "1111" exists 5 or more times. If yes, SAS should create new variables "median" and "mean" giving me the median and mean of the values of "1111" in a new file. Accordingly, "mean" and "median" in this case should be equal to 3:

Code | Median | Mean
1111 | 3 | 3

Now, SAS should do the same for "1112" and realize that there is only 1 observation. In this case, SAS should return "." and instead use all observations with the first three digits to calculate mean and median, so for all observations of code starting with "111" (1111, 1112, ..., 1118, 1119).

Code | Median | Mean
1111 | 3 | 3
1112 | . | .
111 | 2.5 | 2.6

And so forth...

Can anybody help me? I am pretty clueless at the moment...

Thanks a lot!
Tucker
5 REPLIES 5
deleted_user
Not applicable
have a look at PROC MEANS

Let it create an output dataset

If the _FREQ_ of the CLASS code is less than 5 set the results to missing.
Something like[pre]PROC MEANS DATA= your.data1 NOPRINT ;
CLASS Code ;
VAR value ;
OUTPUT OUT= means_out MEAN= mean MEDIAN= median ;
RUN ;
* now apply your rule for low-frequency cases ;
DATA final_results ;
SET means_out ;
IF _FREQ_ < 5 THEN DO ;
mean = . ;
median = . ;
END ;
RUN ;

try that
PeterC
deleted_user
Not applicable
Hi Peter!

That works great, thank you!

May I ask two more questions?

1) How can I also calculate the geometric mean?

I tried:

DATA= your.data1 NOPRINT ;
CLASS Code ;
VAR value ;
OUTPUT OUT= means_out MEAN= mean MEDIAN= median GEOMEAN= geomean;
RUN ;

but SAS doesn't recognize GEOMEAN as an operator/function

2) What if I have Value1, Value2, Value3 and Value4? How can I make SAS to calculate also mean, median and geomean of Value 2 to 4?

I tried:

DATA= your.data1 NOPRINT ;
CLASS Code ;
VAR value1 value2 value3 value4;
OUTPUT OUT= means_out MEAN= mean MEDIAN= median ;
RUN ;

but work.means_out only contains two columns stating mean and median of value1 instead of 8 columns (2*4, which I would have expected).

Does anybody know an answer to these questions?

Thanks a lot!
Tucker Message was edited by: sas_rookie
deleted_user
Not applicable
Hi again!

In order to let ME also contribute something to the discussion I wanna answer some of my own questions:

@ Question 1):

As far as I am concerned SAS supports GEOMEAN as a function (http://support.sas.com/documentation/whatsnew/91x/lrdictwhatsnew900.htm) but apparently not as a part of PROC MEANS (which is disappointing).

As a solution I programmed the following workaround:

PROC MEANS DATA= your.data00 NOPRINT ;
CLASS Code ;
VAR Value;
OUTPUT OUT= means_out_median MEAN= mean MEDIAN= median ;
RUN ;

*now the workaround for the geomean;

PROC MEANS DATA= your.data01 NOPRINT maxdec=2 ;
CLASS Code ;
VAR logvalue;
OUTPUT OUT= means_out_log mean = MeanLog;
RUN ;
data means_out_log ;
set means_out_log ;
geomean=exp(MeanLog) ;
drop Meanlog ;
RUN;

* merge files
proc sort
data = means_out_median;
by Code;
Run;

proc sort
data = means_out_log;
by Code;
Run;

data means_out_all;
merge means_out_log means_out_median;
by Code;
RUN;

* filter the output which is based upon less than 5 observations ;

DATA your.data02 ;
SET means_out_all ;
IF _FREQ_ < 5 THEN DO ;
mean = . ;
median = . ;
geomean= . ;
END ;
Run ;

@ question 2):

No, you cannot do that for more than one variable (again, I was a bit disappointed). If you state more than one variable, SAS only calculates the mean for the first. So I repeated the same procedure for each variable - but maybe there's a more sophisticated solution!?!

Cheers,
Tucker
deleted_user
Not applicable
For #2 you goofed a little.

To get a default list of statistics you need to NOT specify the output name for the discriptive stats.
[pre]
var field1 field2 field3 field4;
output out=outdata min= max= mean= ;
[/pre]
to have mean values for only fields 2 and 4
[pre]
output out=outdata min= max= mean(field2 field4)= ;
[/pre]
There is also a "TIP:" in the documentation about the AUTONAME option.

Since you used "mean=mean" it only provided the "Mean" for the first field.
If you had used "mean=mean1 mean2 mean3 mean4" you would have gotten the mean values for field1 field2 field3.
This is also clearly stated in the SAS documentation.
deleted_user
Not applicable
for a general build of PROC MEANS output, you could do no better than have a look at the paper presented by Myra Olstik and someone-else at "the last SUGI" entitled "A Better Means — The ODS Data Trap. Myra A. Oltsik" www2.sas.com/proceedings/sugi31/059-31.pdf .

of course as it is designed to deliver any/all statistics implemented by PROC MEANS, it doesn't cope with GEOMEAN.... yet...

PeterC

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 737 views
  • 0 likes
  • 1 in conversation