BookmarkSubscribeRSS Feed
nash_zheng
Calcite | Level 5

Hi SAS experts,

 

I am trying to know, for a given year&province&school, what’s the most common other school that people enrol, when they enrolled in multiple school in the same time.

 

A quick example:

 

What I have:

id

prov

year

school

1

29

2008

101

1

29

2008

102

2

35

2009

103

2

35

2009

104

2

35

2009

105

 

What I want:

id

prov

year

school

school2

1

29

2008

101

102

1

29

2008

102

101

2

35

2009

103

104

2

35

2009

103

105

2

35

2009

104

103

2

35

2009

104

105

2

35

2009

105

103

2

35

2009

105

104

 

In the table “want”, there is an additional column “school2” to record the other school that this person enrolled in a given year.

 

Greatly appreciate for any hints!

 
3 REPLIES 3
ballardw
Super User

Please define "most common" and provide an example that actually uses a "most common". What your current example shows for output is "any other".

 

Please provide data in the form of a data step if you actually want working code. The "example" data does not lend itself to pasting into an editor and attempting to write code to read it.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

Reeza
Super User
What if there's a tie in the most common, ie I took three different extra courses at three different institutions? It almost looks like you want the cross product instead.

Something like this may work. Do you need the years to match as well? If so, you would add a condition that t1.year=t2.year as well to the query.

proc sql;
create table want as
select t1.id, t1.prov, t1.year, t1.school, t2.school as school2
from t1, t2
where t1.id=t2.id
and t1.school > t2.school
and t1.year=t2.year
order by id, prov, year, school;
quit;
PhilC
Rhodochrosite | Level 12

This computes an adjacency matrix where the vertex on the graph of schools is the count of how many times a student enrolled into school X and school Y in the same year. From this you want to get a table listing vertices that have the largest "popularity-value" from each node/"school" in the graph.  

 

proc sql;
  create table want as
    select Distinct 
           t1.school as school1, 
           t2.school as school2, 
           count(*) as count
      from have t1, have t2
      where t1.id=t2.id
        and t1.year=t2.year
        and t1.school > t2.school
      group by t1.school,t2.school
;
quit;

data want;
  set want want (rename=(school1=school2 school2=school1));
proc  sort;
  by school1 descending count;
proc  sort nodupkey;
  by school1 ;
run;

I once took this free EdX internet course on Excel and data analysis with Felienne Hermans from the Delft University of Technology, Netherlands.  She taught her students, using data yanked from IMDB.com, to calculate an actor's Bacon number using similar math to this.  It was fun.  

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 483 views
  • 0 likes
  • 4 in conversation