Hi,
I have a question about calculating mean of score within each school with additional condition. The variable for mean of score within each school [mean_score] has been created. I need create two more variables now and am not sure how to program. Below is a fake small dataset for program develop. Real data contains about 100,000 observations (students) belonging to about 100 schools.
First, I need create a variable for mean of score within each school but without the score of the same row. For example, for student_ID=123, it should be (95+89+95)/(4-1), which is without 98 (his own’s score) in numerator and minus 1 (observation) in denominator. For student_ID=124, it should be (98+89+95)/3, and so on.
Second, I need create a variable for mean of score within each school including additional number, which is 25th percentile of score for ALL student (in the entire dataset, not within each school). For example, for student_ID=123, it should be (98+95+89+95+25th percentile of score in the entire dataset)/(4+1). So this variable will be the same value for students within each school.
Many thanks in advance for your answer!!!
student_ID school_ID score mean_score
123 1 98 94.25
124 1 95 94.25
125 1 89 94.25
126 1 95 94.25
127 2 96 80.2
128 2 78 80.2
129 2 85 80.2
130 2 74 80.2
131 2 68 80.2
132 3 98 87
133 3 84 87
134 3 85 87
135 3 75 87
136 3 99 87
137 3 81 87
How do you like this?
/* Compute overall 25th percentile of score */
proc summary data=have;
var score;
output out=perc25 q1=q;
quit;
/* Write it into macro variable Q */
proc sql noprint;
select q
into :q
from perc25;
quit;
%put Overall 25th percentile of score: &q;
/* Create dataset with number of students per school */
data numstud;
do nst=1 by 1 until(last.school_ID);
set have(keep=school_ID);
by school_ID;
end;
run;
/* Add the desired new variables */
proc sql;
create table want(drop=nst) as
select *, (mean_score*nst-score)/(nst-1) as mrest, (mean_score*nst+&q)/(nst+1) as mwithq
from have natural join numstud
order by school_ID, student_ID;
quit;
Edit: Added optional ORDER BY clause.
In retrospect, your first question would have been easy to answer if you had prepared the data slightly differently. Instead of adding MEAN_SCORE to each observation, it likely would have been possible to add SUM_SCORE and N_SCORE. That would have made the calculations easy. If that is still a possibility, I would suggest it. If not, we can always use the existing data to generate those numbers. The program wouldn't be simple, but it would be relatively short.
Regarding your second question, it looks like two preparatory steps are needed. First, calculate the 25th percentile and store it in a SAS data set. Then append it to each observation. Are you comfortable with the first part of that, or do you need help with both parts?
Thanks Astounding so much for your answer! I think I can get the first part done now. I need help for the second part. Thanks!
OK, assuming you can create a SAS data set (I'll call it CUTOFF) with a single variable representing the 25th percentile (I'll call it percentile_25), here's how to add one observation to every observation in your existing data set:
data want;
if _n_=1 then set cutoff;
set have;
*** add calculations here as needed;
run;
If you actually have SUM_SCORE and N_SCORE available, the calculations should be relatively easy. For some of the calculations, you will be dividing by (N_SCORE - 1). In that case, be sure to check:
if N_SCORE > 1 then ...
Good luck.
How do you like this?
/* Compute overall 25th percentile of score */
proc summary data=have;
var score;
output out=perc25 q1=q;
quit;
/* Write it into macro variable Q */
proc sql noprint;
select q
into :q
from perc25;
quit;
%put Overall 25th percentile of score: &q;
/* Create dataset with number of students per school */
data numstud;
do nst=1 by 1 until(last.school_ID);
set have(keep=school_ID);
by school_ID;
end;
run;
/* Add the desired new variables */
proc sql;
create table want(drop=nst) as
select *, (mean_score*nst-score)/(nst-1) as mrest, (mean_score*nst+&q)/(nst+1) as mwithq
from have natural join numstud
order by school_ID, student_ID;
quit;
Edit: Added optional ORDER BY clause.
/* Compute overall 1st, 25th, 75th and 99th percentile of score */
proc summary data=have;
var score;
output out=perc p1=p1 p25=p25 p75=p75 p99=p99;
quit;
/* Write them into macro variables */
proc sql noprint;
select p1, p25, p75, p99
into :p1, :p25, :p75, :p99
from perc;
quit;
%put Overall percentiles of score:;
%put P1=&p1;
%put P25=&p25;
%put P75=&p75;
%put P99=&p99;
/* Create dataset with number of students per school */
data numstud;
do nst=1 by 1 until(last.school_ID);
set have(keep=school_ID);
by school_ID;
end;
run;
/* Add the desired new variables */
proc sql;
create table want(drop=nst) as
select *, (mean_score*nst-score)/(nst-1) as mrest,
(mean_score*nst+&p1) /(nst+1) as mwithp1,
(mean_score*nst+&p25)/(nst+1) as mwithp25,
(mean_score*nst+&p75)/(nst+1) as mwithp75,
(mean_score*nst+&p99)/(nst+1) as mwithp99
from have natural join numstud
order by school_ID, student_ID;
quit;
Hi FreelanceReinhard, Thanks so much for your updated programming and explanation! It is really helpful!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.