DATA Step, Macro, Functions and more

calculate mean according to condition

Reply
Contributor
Posts: 35

calculate mean according to condition

Dear all,

 

I have one dataset as below.

 

Subj score

a      1

a       

a      3

a      4

b      

b      4

b      

b      5

 

my requirement is that i want to calculate average of score only for subject where number of non missing values equal to 3 or greatee than .If number of non missing value is less  than 3 then   will set the mean as . and add the same as observation after last record for particular subj.

 

Below is output

a      1

a       

a      3

a      4

average of a  "value"

b      

b      4

b      

b      5

average of b .

Super User
Posts: 5,438

Re: calculate mean according to condition

The calculation can be done in SQL. But you also show an example of a report layout, which is probably better to do separately, like in PROC REPORT (and there are others that can you guide better than me).

 

For SQL, use agregate filtering with HAVING

 

i.e.

group by id

having sum(case when missing(score) then 0 else 1 end) ge 3

Data never sleeps
SAS Employee
Posts: 51

Re: calculate mean according to condition

Hello,

 

This is producing what you want. But I completely agree with LinusH, ... you should create a report instead of 'enriching' your dataset with an extra line below each subject containing his/her/its average.

 

So I don't recommend using the last data step. You don't want to spoil a dataset by adding extra rows containing an average while the other rows contain raw scores.

 

data work.ab;
infile datalines truncover;
input Subj $ score;
datalines;
a      1
a       
a      3
a      4
b      
b      4
b      
b      5
;
run;

proc means data=work.ab nway noprint;
 class Subj;
 var score;
 output out=work.absummary mean= n= nmiss= / autoname;
run;

PROC SQL noprint;
 create table work.abmerge as
 select a.* , b.*
 from   work.ab         a
      , work.absummary  b
 where a.Subj = b.Subj
 order by a.Subj , a.Score;
quit;

data work.abmerge;
 length Subj $ 25;
 set work.abmerge;
 by Subj;
 if last.Subj then do; 
  output; 
  if score_N >=3 then score=Score_Mean; 
  else score=.; 
  Subj="Avg. of above Subject"; 
  output;
 end;
if index(Subj,'Avg')=0 then output; 
run;
/* end of program */

 

 

Cheers,

Koen

Respected Advisor
Posts: 3,156

Re: calculate mean according to condition

Echo with @LinusH and @sbxkoenk, this should be a report request. Here is a one-pass data step alternative:

data work.ab;
	infile datalines truncover;
	length Subj $ 50;
	input Subj score;
	datalines;
a      1
a       
a      3
a      4
b      
b      4
b      
b      5
;
run;

data want;
	do until (last.subj);
		set ab;
		by subj notsorted;

		if score>=3 then
			do;
				_ct+1;
				_tot+score;
			end;

		output;
	end;

	subj='Avg. of above Subject';
	score=_tot/_ct;
	output;
	call missing(of _:);
	drop _:;
run;
Super User
Posts: 10,046

Re: calculate mean according to condition

Want make a REPORT ?

 

 

data have;
input Subj $ score;
cards;
a      1
a       .
a      3
a      4
b      .
b      4
b      .
b      5
;
run;
proc report data=have nowd ;
column Subj score ;
define Subj/order;
define score/analysis mean;
compute score;
 if not missing(score.mean) then n+1; 
 if not missing(_BREAK_) then do;
  if n lt 4 then call missing(score.mean);
  n=0;
 end;
endcomp;
break after Subj/summarize;
run;
Ask a Question
Discussion stats
  • 4 replies
  • 277 views
  • 1 like
  • 5 in conversation