BookmarkSubscribeRSS Feed
lnicholl9
Calcite | Level 5

Hi,

 

I have some data in which students are tracked between schools and years--observations with the same student ID and same school ID are assumed to be the same person (i.e. person A in school 65 in year 2007 = person A in school 65 in year 2008). But person A in school 65 is not the same as person A in school 64, regardless of year. I need to calculate the average score over years. Some students appear in multiple years, and some only appear in one (in which case their average score would just be their score from last year). Additionally, students can appear 

 

Here is what my data currently looks like:

data test;

   infile datalines;

   input studentID $ schoolID year score;

datalines;

a 123 2000 78

a 123 2000 78

a 123 2001 80

a 123 2001 80

a 123 2001 80

b 123 2002 95

b 456 2003 91

c 456 2003 90

c 456 2004 98

c 456 2006 92

;

run;

 

Basically i want another column with average score, that should be 79, 79, 79, 79, 79, 95, 91, 93.3, 93.3, 93.3

 

I feel like I might need a combination of sql and by group processing but I'm getting stuck on calculating the average score vertically.

 

Thanks!

8 REPLIES 8
Rick_SAS
SAS Super FREQ

Sort by studentID and schoolID, then use BY-group processing to compute the means for each student-school pair. Lastly, merge the results:

 


proc sort data=test;
by studentID schoolID;;
run;

proc means data=test noprint;
by studentID schoolID;
var score;
output out=means(drop=_type_ _freq_) mean=mean;
run;

data All;
merge test means;
by studentID schoolID;
run;
Reeza
Super User

It should be 79.2?

 


proc sql;
create table want as
select *, mean(score) as avg_score
from test
group by schoolid, studentid;
quit;
lnicholl9
Calcite | Level 5

Thanks for the quick responses! I should have been more clear about how volume is calculated

Take this example:

studentID schoolID year score

a                123        2000   78

a                 123       2001   80

a                 123       2001    80

 

The average score should be 79.0, not 79.3. This is because the second two observations are essentially duplicates. There are other covariates which are different but these 4 columns are the same. So the mean should not be calculated like these 3 are independent. It should be calculated as sum(of unique scores)/number of unique years.

 

Thanks!

Reeza
Super User

And you want to keep the duplicates in your output for some reason?

Reeza
Super User

Here's one way - first identify the duplicates and then average. If all you're doing is an average, it may be worth just adding that to the data step as well. 

 

data test1;
set test;
by studentID schoolID year;
if first.year then duplicate=0;
else duplicate=1;
run;

proc sql;
create table want as
select *, mean(case when duplicate=0 then score else . end) as avg_score
from test1
group by schoolid, studentid;
quit;
lnicholl9
Calcite | Level 5

This worked perfectly, thanks!!

PGStats
Opal | Level 21

Try this:

 

proc sql;
create table test_avg as
select 
    studentID, 
    schoolID, 
    year,
    sum(score/n)/sum(1/n) as avg_score
from (
    select 
        *, 
        count(score) as n
    from test
    group by studentID, schoolID, year )
group by studentID, schoolID;
select * from test_avg;
quit;
PG
Ksharp
Super User

data test;
   infile datalines;
   input studentID $ schoolID year score;
datalines;
a 123 2000 78
a 123 2000 78
a 123 2001 80
a 123 2001 80
a 123 2001 80
b 123 2002 95
b 456 2003 91
c 456 2003 90
c 456 2004 98
c 456 2006 92
;
run;

proc sql;
select *,(select mean(distinct score) from test 
 where studentID=a.studentID and schoolID=a.schoolID) as mean
 from test as a;
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 4577 views
  • 5 likes
  • 5 in conversation