BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michellel
Calcite | Level 5

Hi,

 

I have a question about calculating mean of score within each school with additional condition. The variable for mean of score within each school [mean_score] has been created. I need create two more variables now and am not sure how to program. Below is a fake small dataset for program develop. Real data contains about 100,000 observations (students) belonging to about 100 schools.

 

First, I need create a variable for mean of score within each school but without the score of the same row. For example, for student_ID=123, it should be (95+89+95)/(4-1), which is without 98 (his own’s score) in numerator and minus 1 (observation) in denominator. For student_ID=124, it should be (98+89+95)/3, and so on.

 

Second, I need create a variable for mean of score within each school including additional number, which is 25th percentile of score for ALL student (in the entire dataset, not within each school). For example, for student_ID=123, it should be (98+95+89+95+25th percentile of score in the entire dataset)/(4+1). So this variable will be the same value for students within each school.

 

Many thanks in advance for your answer!!!

 

student_ID         school_ID            score     mean_score

123                         1                              98           94.25

124                         1                              95           94.25

125                         1                              89           94.25

126                         1                              95           94.25

127                         2                              96           80.2

128                         2                              78           80.2

129                         2                              85           80.2

130                         2                              74           80.2

131                         2                              68           80.2

132                         3                              98           87

133                         3                              84           87

134                         3                              85           87

135                         3                              75           87

136                         3                              99           87

137                         3                              81           87

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

How do you like this?

/* Compute overall 25th percentile of score */

proc summary data=have;
var score;
output out=perc25 q1=q;
quit;


/* Write it into macro variable Q */

proc sql noprint;
select q
into :q
from perc25;
quit;

%put Overall 25th percentile of score: &q;

/* Create dataset with number of students per school */

data numstud;
do nst=1 by 1 until(last.school_ID);
  set have(keep=school_ID);
  by school_ID;
end;
run;


/* Add the desired new variables */

proc sql;
create table want(drop=nst) as
select *, (mean_score*nst-score)/(nst-1) as mrest, (mean_score*nst+&q)/(nst+1) as mwithq
from have natural join numstud
order by school_ID, student_ID;
quit;

Edit: Added optional ORDER BY clause.

View solution in original post

9 REPLIES 9
Astounding
PROC Star

In retrospect, your first question would have been easy to answer if you had prepared the data slightly differently.  Instead of adding MEAN_SCORE to each observation, it likely would have been possible to add SUM_SCORE and N_SCORE.  That would have made the calculations easy.  If that is still a possibility, I would suggest it.  If not, we can always use the existing data to generate those numbers.  The program wouldn't be simple, but it would be relatively short.

 

Regarding your second question, it looks like two preparatory steps are needed.  First, calculate the 25th percentile and store it in a SAS data set.  Then append it to each observation.  Are you comfortable with the first part of that, or do you need help with both parts?

michellel
Calcite | Level 5

Thanks Astounding so much for your answer! I think I can get the first part done now. I need help for the second part. Thanks!

Astounding
PROC Star

OK, assuming you can create a SAS data set (I'll call it CUTOFF) with a single variable representing the 25th percentile (I'll call it percentile_25), here's how to add one observation to every observation in your existing data set:

 

data want;

if _n_=1 then set cutoff;

set have;

*** add calculations here as needed;

run;

 

If you actually have SUM_SCORE and N_SCORE available, the calculations should be relatively easy.  For some of the calculations, you will be dividing by (N_SCORE - 1).  In that case, be sure to check:

 

if N_SCORE > 1 then ...

 

Good luck.

michellel
Calcite | Level 5
Thanks so much, Astounding! So what you did is append the dataset cutoff to each first observation of each school? I have one more question. I need create 3 more variables, similar to the second variable [mwithq], using 1st percentile, 75th percentile, 99th percentile of score, instead of 25th percentile of score. How can I create these variable at the same time?

Thanks much for the check reminder. It is very useful.

Thanks again for your answer!
FreelanceReinh
Jade | Level 19

How do you like this?

/* Compute overall 25th percentile of score */

proc summary data=have;
var score;
output out=perc25 q1=q;
quit;


/* Write it into macro variable Q */

proc sql noprint;
select q
into :q
from perc25;
quit;

%put Overall 25th percentile of score: &q;

/* Create dataset with number of students per school */

data numstud;
do nst=1 by 1 until(last.school_ID);
  set have(keep=school_ID);
  by school_ID;
end;
run;


/* Add the desired new variables */

proc sql;
create table want(drop=nst) as
select *, (mean_score*nst-score)/(nst-1) as mrest, (mean_score*nst+&q)/(nst+1) as mwithq
from have natural join numstud
order by school_ID, student_ID;
quit;

Edit: Added optional ORDER BY clause.

michellel
Calcite | Level 5
Thanks so much, FreelanceReinhard! Your code works! I have one more question. I need create 3 more variables, similar to the second variable [mwithq], using 1st percentile, 75th percentile, 99th percentile of score, instead of 25th percentile of score. How can I create these variable at the same time? In addition, I do not know how to refer to 1% percentile like the q1 for 25th percentile.

Thanks so much for your answer!
FreelanceReinh
Jade | Level 19
/* Compute overall 1st, 25th, 75th and 99th percentile of score */

proc summary data=have;
var score;
output out=perc p1=p1 p25=p25 p75=p75 p99=p99;
quit;


/* Write them into macro variables */

proc sql noprint;
select p1, p25, p75, p99
into :p1, :p25, :p75, :p99
from perc;
quit;

%put Overall percentiles of score:;
%put P1=&p1;
%put P25=&p25;
%put P75=&p75;
%put P99=&p99;


/* Create dataset with number of students per school */

data numstud;
do nst=1 by 1 until(last.school_ID);
  set have(keep=school_ID);
  by school_ID;
end;
run;


/* Add the desired new variables */

proc sql;
create table want(drop=nst) as
select *, (mean_score*nst-score)/(nst-1) as mrest,
          (mean_score*nst+&p1) /(nst+1) as mwithp1,
          (mean_score*nst+&p25)/(nst+1) as mwithp25,
          (mean_score*nst+&p75)/(nst+1) as mwithp75,
          (mean_score*nst+&p99)/(nst+1) as mwithp99
from have natural join numstud
order by school_ID, student_ID;
quit;
FreelanceReinh
Jade | Level 19
Btw, PROC SQL is quite tolerant with regard to divisions by zero. Of course, you should be aware of the (remote?) possibility that there is a school with only one student, for which then MREST is literally not defined. But unlike the DATA step, PROC SQL does not complain when it divides by NST-1=0 in this situation. It simply assigns a missing value to MREST, which I think is fair.
michellel
Calcite | Level 5

Hi FreelanceReinhard, Thanks so much for your updated programming and explanation! It is really helpful!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2699 views
  • 1 like
  • 3 in conversation