BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
XabiAlonso
Fluorite | Level 6

Hi SAS Experts,

 

Need some help! I'm trying to start with a given "student list", to create a "synergy list" that looks like below. I've tried many different ways, still no success yet.

 

/* Student List */
data student;
infile datalines dlm=',';
input STUDENT_ID $3 SUBJECT1-SUBJECT5;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;

/* Synergy List */
data synergy;
infile datalines dlm=',';
input STUDENT_ID $3 STUDENT1-STUDENT4;
datalines;
001,3,2,1,2
002,2,3,2,1
003,1,2,3,1
004,2,1,1,3
;

 

The idea is this. With a list of students and the subjects they are interested in (value "1" equals interested), I would like to find out for each student which of their friends have the best synergy with them, ie. someone who share the most subjects they both like.

 

The plan is for each common interested subject between two students, I will assign a "synergy score" of 1. Eg. if student 001 and 002 both like Biology and English, they would be assigned a synergy score of "2" between them.

 

In another words, I would like to figure a way to get a "synergy list" from a "student list". The synergy list is a N*N table with the synergy score between each student pair combination. N is total number of students.

 

For example, from the Synergy list above, I can tell Student 001 has the highest synergy score with 002 and 003 (both has synergy score of "2" with 001), which is potentially good for pair up.

 

Any tips or help is appreciated!!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

The "natural" data format in SAS is the long form. Move your data from wide to long, do the tally, then move back to wide form:

 

/* Student List */
data student;
infile datalines dlm=',';
input STUDENT_ID $3 SUBJECT1-SUBJECT5;
array s subject: ;
do subj = 1 to dim(s);
    interest = s{subj};
    output;
    end;
drop subject: ;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;

proc sql;
create table syn as
select
    a.student_id as student_id,    
    b.student_id as stud2,
    sum(a.interest and b.interest) as common_interest
from
    student as a inner join
    student as b on a.subj=b.subj
group by a.student_id, b.student_Id;
quit;

proc transpose data=syn out=synergy(drop=_name_) prefix=student;
by student_id;
id stud2;
var common_interest;
run;

proc print data=synergy; run;
PG

View solution in original post

7 REPLIES 7
PGStats
Opal | Level 21

The "natural" data format in SAS is the long form. Move your data from wide to long, do the tally, then move back to wide form:

 

/* Student List */
data student;
infile datalines dlm=',';
input STUDENT_ID $3 SUBJECT1-SUBJECT5;
array s subject: ;
do subj = 1 to dim(s);
    interest = s{subj};
    output;
    end;
drop subject: ;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;

proc sql;
create table syn as
select
    a.student_id as student_id,    
    b.student_id as stud2,
    sum(a.interest and b.interest) as common_interest
from
    student as a inner join
    student as b on a.subj=b.subj
group by a.student_id, b.student_Id;
quit;

proc transpose data=syn out=synergy(drop=_name_) prefix=student;
by student_id;
id stud2;
var common_interest;
run;

proc print data=synergy; run;
PG
pau13rown
Lapis Lazuli | Level 10

sounds like matching used in observational studies may be analogous: http://www2.sas.com/proceedings/sugi29/173-29.pdf

Reeza
Super User

@pau13rown wrote:

sounds like matching used in observational studies may be analogous: http://www2.sas.com/proceedings/sugi29/173-29.pdf



Echoing @pau13rown it sounds like case control matching and if you’re using the latest version of SAS STAT there’s a PROC PSMATCH that can help with that. 

 

 

novinosrin
Tourmaline | Level 20
data student;
infile datalines dlm=',';
input STUDENT_ID $ SUBJECT1-SUBJECT5;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;

data w;
set student;
_Student_id=student_id;
do n=1 to nobs;
grp+1;
set student point=_n_;
output;
set student nobs=nobs point=n;
output;
end;
run;

data w1;
set w;
by grp;
array t(*) subject:;
do i=1 to dim(t);
if sum(t(i), lag(t(i)))=2 then count+1;
end;
if not last.grp then count=0;
if count ne 0;
keep _student_id grp count;
run;

proc transpose data=w1 out=want(drop=_name_)  prefix=Student_;
by _Student_id;
var count;
run;
novinosrin
Tourmaline | Level 20

@XabiAlonso Your first post(Welcome bienvenido) to the SAS forum and that too with an interesting question. I'd go with PG's answer  for real and have fun with mine. If you are experienced user, I'm sure you'd comprehend both the solutions. 

I can of course eliminate data w1 step and merge that into one with the previous and make in work in 2 steps. Have a great weekend

PGStats
Opal | Level 21

Here is another version of my code which is more appropriate if your data is already in a wide format dataset.

 

/* Student List */
data student;
infile datalines dlm=',';
input STUDENT_ID $3 SUBJECT1-SUBJECT5;
datalines;
001,1,0,0,1,1
002,0,0,1,1,1
003,0,1,1,0,1
004,1,1,0,1,0
;

/* Convert wide to long format */
proc transpose data=student out=studLong prefix=interest name=subj;
by student_id;
var subject: ;
run;

/* Count common interests */
proc sql;
create table syn as
select
    a.student_id as student_id,    
    b.student_id as stud2,
    sum(a.interest1 and b.interest1) as common_interest
from
    studlong as a inner join
    studlong as b on a.subj=b.subj
group by a.student_id, b.student_Id;
quit;

/* Convert long to wide format */
proc transpose data=syn out=synergy(drop=_name_) prefix=student;
by student_id;
id stud2;
var common_interest;
run;

proc print data=synergy; run;
PG
XabiAlonso
Fluorite | Level 6

Thanks so much everyone on the input. Especially to PG and Novinosrin, you took the time to write the code, in SQL and PROC. Now I have some solid examples to follow to solve similar problems in the future. So appreciative on this!!

Smiley Very HappySmiley Very Happy

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 540 views
  • 2 likes
  • 5 in conversation