BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi all,

 

I'm trying to do an analysis of student results. For this I'm measuring performance of a student in a particular module vs average performance in all other modules. Can anyone advise how I could go about calculating this?

 

My data is very straightforward, 3 columns:

Stud_id

module_id

result

 

Thank you

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

OK, with one observation per student per module, the programming isn't too difficult:

 

proc sort data=have;

   by stud_id;

run;

 

proc summary data=have;

   by stud_id;

   var result;

   output out=summary_stats (keep=n_modules total_result) sum=total_result n=n_modules;

run;

 

data want;

   merge have summary_stats;

   by stud_id;

   other_modules_mean = (total_result - result) / (n_modules - 1);

run;

View solution in original post

6 REPLIES 6
Astounding
PROC Star

Is each student limited to a single observation per module?

 

If not, you might have to provide an example of the results you are looking for (just for one student, but with different numbers of observations per module).

itchyeyeballs
Pyrite | Level 9

Hi,

 

Yes, each student has a single result per module.

 

Thank you

Astounding
PROC Star

OK, with one observation per student per module, the programming isn't too difficult:

 

proc sort data=have;

   by stud_id;

run;

 

proc summary data=have;

   by stud_id;

   var result;

   output out=summary_stats (keep=n_modules total_result) sum=total_result n=n_modules;

run;

 

data want;

   merge have summary_stats;

   by stud_id;

   other_modules_mean = (total_result - result) / (n_modules - 1);

run;

itchyeyeballs
Pyrite | Level 9

Thank you, very much appreciated!

 

Can I push my luck and ask you to annotate what the code is doing? I'm coming from an MS world so this is very different to me

 

Astounding
PROC Star

I can give you the big picture ... but there's no substitute for learning SAS.

 

In SAS programs, each step runs in order, and independently of the other steps.  So PROC SORT completes, then PROC SUMMARY begins.  PROC SUMMARY completes, then the DATA step runs.

 

PROC SORT puts the observations in order.  That is necessary to permit processing BY STUD_ID in later steps.

 

PROC SUMMARY creates a summary data set named SUMMARY_STATS.  It contains one observation per STUD_ID, with two variables.  TOTAL_RESULT is the sum of all the RESULT values for that STUD_ID, and N_MODULES is the number of observations for that STUD_ID.

 

The DATA step adds those two variables to every original observation, matching by STUD_ID.  That match gives you all the information needed for calculations.

itchyeyeballs
Pyrite | Level 9

Thank you so much,

 

embarrassingly it was the basic logic that was eluding me, I was thinking about how to dynamically exclude one row, it didnt ocur to me to just add everything up then subbract, need to remeber to take a step back sometimes!

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1030 views
  • 2 likes
  • 2 in conversation