Solved: Re: How do I figure out the best way to pair up between students?

XabiAlonso · Posted 06-30-2018 03:25 PM

Hi SAS Experts,

Need some help! I'm trying to start with a given "student list", to create a "synergy list" that looks like below. I've tried many different ways, still no success yet.

/* Student List */
data student;
infile datalines dlm=',';
input STUDENT_ID $3 SUBJECT1-SUBJECT5;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;

/* Synergy List */
data synergy;
infile datalines dlm=',';
input STUDENT_ID $3 STUDENT1-STUDENT4;
datalines;
001,3,2,1,2
002,2,3,2,1
003,1,2,3,1
004,2,1,1,3
;

The idea is this. With a list of students and the subjects they are interested in (value "1" equals interested), I would like to find out for each student which of their friends have the best synergy with them, ie. someone who share the most subjects they both like.

The plan is for each common interested subject between two students, I will assign a "synergy score" of 1. Eg. if student 001 and 002 both like Biology and English, they would be assigned a synergy score of "2" between them.

In another words, I would like to figure a way to get a "synergy list" from a "student list". The synergy list is a N*N table with the synergy score between each student pair combination. N is total number of students.

For example, from the Synergy list above, I can tell Student 001 has the highest synergy score with 002 and 003 (both has synergy score of "2" with 001), which is potentially good for pair up.

Any tips or help is appreciated!!

PGStats · Posted 06-30-2018 04:38 PM

The "natural" data format in SAS is the long form. Move your data from wide to long, do the tally, then move back to wide form:

/* Student List */
data student;
infile datalines dlm=',';
input STUDENT_ID $3 SUBJECT1-SUBJECT5;
array s subject: ;
do subj = 1 to dim(s);
    interest = s{subj};
    output;
    end;
drop subject: ;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;

proc sql;
create table syn as
select
    a.student_id as student_id,    
    b.student_id as stud2,
    sum(a.interest and b.interest) as common_interest
from
    student as a inner join
    student as b on a.subj=b.subj
group by a.student_id, b.student_Id;
quit;

proc transpose data=syn out=synergy(drop=_name_) prefix=student;
by student_id;
id stud2;
var common_interest;
run;

proc print data=synergy; run;

PG

View solution in original post

PGStats · Posted 06-30-2018 04:38 PM

The "natural" data format in SAS is the long form. Move your data from wide to long, do the tally, then move back to wide form:

/* Student List */
data student;
infile datalines dlm=',';
input STUDENT_ID $3 SUBJECT1-SUBJECT5;
array s subject: ;
do subj = 1 to dim(s);
    interest = s{subj};
    output;
    end;
drop subject: ;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;

proc sql;
create table syn as
select
    a.student_id as student_id,    
    b.student_id as stud2,
    sum(a.interest and b.interest) as common_interest
from
    student as a inner join
    student as b on a.subj=b.subj
group by a.student_id, b.student_Id;
quit;

proc transpose data=syn out=synergy(drop=_name_) prefix=student;
by student_id;
id stud2;
var common_interest;
run;

proc print data=synergy; run;

PG

pau13rown · Posted 06-30-2018 04:42 PM

sounds like matching used in observational studies may be analogous: http://www2.sas.com/proceedings/sugi29/173-29.pdf

Reeza · Posted 06-30-2018 07:37 PM

@pau13rown wrote:

sounds like matching used in observational studies may be analogous: http://www2.sas.com/proceedings/sugi29/173-29.pdf

Echoing @pau13rown it sounds like case control matching and if you’re using the latest version of SAS STAT there’s a PROC PSMATCH that can help with that.

novinosrin · Posted 06-30-2018 06:24 PM

data student;
infile datalines dlm=',';
input STUDENT_ID $ SUBJECT1-SUBJECT5;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;

data w;
set student;
_Student_id=student_id;
do n=1 to nobs;
grp+1;
set student point=_n_;
output;
set student nobs=nobs point=n;
output;
end;
run;

data w1;
set w;
by grp;
array t(*) subject:;
do i=1 to dim(t);
if sum(t(i), lag(t(i)))=2 then count+1;
end;
if not last.grp then count=0;
if count ne 0;
keep _student_id grp count;
run;

proc transpose data=w1 out=want(drop=_name_)  prefix=Student_;
by _Student_id;
var count;
run;

novinosrin · Posted 06-30-2018 10:08 PM

@XabiAlonso Your first post(Welcome bienvenido) to the SAS forum and that too with an interesting question. I'd go with PG's answer for real and have fun with mine. If you are experienced user, I'm sure you'd comprehend both the solutions.

I can of course eliminate data w1 step and merge that into one with the previous and make in work in 2 steps. Have a great weekend

PGStats · Posted 06-30-2018 10:41 PM

Here is another version of my code which is more appropriate if your data is already in a wide format dataset.

/* Student List */
data student;
infile datalines dlm=',';
input STUDENT_ID $3 SUBJECT1-SUBJECT5;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;

/* Convert wide to long format */
proc transpose data=student out=studLong prefix=interest name=subj;
by student_id;
var subject: ;
run;

/* Count common interests */
proc sql;
create table syn as
select
    a.student_id as student_id,    
    b.student_id as stud2,
    sum(a.interest1 and b.interest1) as common_interest
from
    studlong as a inner join
    studlong as b on a.subj=b.subj
group by a.student_id, b.student_Id;
quit;

/* Convert long to wide format */
proc transpose data=syn out=synergy(drop=_name_) prefix=student;
by student_id;
id stud2;
var common_interest;
run;

proc print data=synergy; run;

PG

XabiAlonso · Posted 07-17-2018 09:48 PM

Thanks so much everyone on the input. Especially to PG and Novinosrin, you took the time to write the code, in SQL and PROC. Now I have some solid examples to follow to solve similar problems in the future. So appreciative on this!!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away