Hi SAS Experts,
Need some help! I'm trying to start with a given "student list", to create a "synergy list" that looks like below. I've tried many different ways, still no success yet.
/* Student List */
data student;
infile datalines dlm=',';
input STUDENT_ID $3 SUBJECT1-SUBJECT5;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;
/* Synergy List */
data synergy;
infile datalines dlm=',';
input STUDENT_ID $3 STUDENT1-STUDENT4;
datalines;
001,3,2,1,2
002,2,3,2,1
003,1,2,3,1
004,2,1,1,3
;
The idea is this. With a list of students and the subjects they are interested in (value "1" equals interested), I would like to find out for each student which of their friends have the best synergy with them, ie. someone who share the most subjects they both like.
The plan is for each common interested subject between two students, I will assign a "synergy score" of 1. Eg. if student 001 and 002 both like Biology and English, they would be assigned a synergy score of "2" between them.
In another words, I would like to figure a way to get a "synergy list" from a "student list". The synergy list is a N*N table with the synergy score between each student pair combination. N is total number of students.
For example, from the Synergy list above, I can tell Student 001 has the highest synergy score with 002 and 003 (both has synergy score of "2" with 001), which is potentially good for pair up.
Any tips or help is appreciated!!
The "natural" data format in SAS is the long form. Move your data from wide to long, do the tally, then move back to wide form:
/* Student List */
data student;
infile datalines dlm=',';
input STUDENT_ID $3 SUBJECT1-SUBJECT5;
array s subject: ;
do subj = 1 to dim(s);
interest = s{subj};
output;
end;
drop subject: ;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;
proc sql;
create table syn as
select
a.student_id as student_id,
b.student_id as stud2,
sum(a.interest and b.interest) as common_interest
from
student as a inner join
student as b on a.subj=b.subj
group by a.student_id, b.student_Id;
quit;
proc transpose data=syn out=synergy(drop=_name_) prefix=student;
by student_id;
id stud2;
var common_interest;
run;
proc print data=synergy; run;
The "natural" data format in SAS is the long form. Move your data from wide to long, do the tally, then move back to wide form:
/* Student List */
data student;
infile datalines dlm=',';
input STUDENT_ID $3 SUBJECT1-SUBJECT5;
array s subject: ;
do subj = 1 to dim(s);
interest = s{subj};
output;
end;
drop subject: ;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;
proc sql;
create table syn as
select
a.student_id as student_id,
b.student_id as stud2,
sum(a.interest and b.interest) as common_interest
from
student as a inner join
student as b on a.subj=b.subj
group by a.student_id, b.student_Id;
quit;
proc transpose data=syn out=synergy(drop=_name_) prefix=student;
by student_id;
id stud2;
var common_interest;
run;
proc print data=synergy; run;
sounds like matching used in observational studies may be analogous: http://www2.sas.com/proceedings/sugi29/173-29.pdf
@pau13rown wrote:
sounds like matching used in observational studies may be analogous: http://www2.sas.com/proceedings/sugi29/173-29.pdf
Echoing @pau13rown it sounds like case control matching and if you’re using the latest version of SAS STAT there’s a PROC PSMATCH that can help with that.
data student;
infile datalines dlm=',';
input STUDENT_ID $ SUBJECT1-SUBJECT5;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;
data w;
set student;
_Student_id=student_id;
do n=1 to nobs;
grp+1;
set student point=_n_;
output;
set student nobs=nobs point=n;
output;
end;
run;
data w1;
set w;
by grp;
array t(*) subject:;
do i=1 to dim(t);
if sum(t(i), lag(t(i)))=2 then count+1;
end;
if not last.grp then count=0;
if count ne 0;
keep _student_id grp count;
run;
proc transpose data=w1 out=want(drop=_name_) prefix=Student_;
by _Student_id;
var count;
run;
@XabiAlonso Your first post(Welcome bienvenido) to the SAS forum and that too with an interesting question. I'd go with PG's answer for real and have fun with mine. If you are experienced user, I'm sure you'd comprehend both the solutions.
I can of course eliminate data w1 step and merge that into one with the previous and make in work in 2 steps. Have a great weekend
Here is another version of my code which is more appropriate if your data is already in a wide format dataset.
/* Student List */
data student;
infile datalines dlm=',';
input STUDENT_ID $3 SUBJECT1-SUBJECT5;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;
/* Convert wide to long format */
proc transpose data=student out=studLong prefix=interest name=subj;
by student_id;
var subject: ;
run;
/* Count common interests */
proc sql;
create table syn as
select
a.student_id as student_id,
b.student_id as stud2,
sum(a.interest1 and b.interest1) as common_interest
from
studlong as a inner join
studlong as b on a.subj=b.subj
group by a.student_id, b.student_Id;
quit;
/* Convert long to wide format */
proc transpose data=syn out=synergy(drop=_name_) prefix=student;
by student_id;
id stud2;
var common_interest;
run;
proc print data=synergy; run;
Thanks so much everyone on the input. Especially to PG and Novinosrin, you took the time to write the code, in SQL and PROC. Now I have some solid examples to follow to solve similar problems in the future. So appreciative on this!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.