The following is a simplified example of the structure of my data: STUDENT_ID SCHOOL_ID_2009 SCHOOL_ID_2010 001 002 002 002 002 002 003 002 002 004 002 004 005 003 005 006 003 005 007 003 005 008 003 006 In my real data, I have about 8 million students and about 30 thousand schools. I want to identify students trajectories across time. However, there are some problems with school id codes. Some schools change their ID for administrative reasons (e.g. merges and acquisitions, changes of name, owner, etc.). In the data example I provided above, I can find the school that has the 2009 ID "002" in the 2010 dataset, although one of its students moved to school "004". However, the school with the 2009 ID "003" is not found in the 2010 dataset. In my example, the most likely new ID for this school is "005", since 3/4 of the students of school "003" moved to school "005". Basically, my problem is: how to identify what are the most likely new IDs for schools that have disappeared from one year to another? I thought that I could calculate what is the percentage of students from each one of the 2009's school that appears in each one of 2010's schools. Do anyone have an idea of how to do this? I need to identify which students are really changing schools and which students are kept in the same school, but the school changed its ID. In cases that are likely that the school had changed its ID, I will assign the same school ID in both years. If anyone have a solution for this problem, specially in PROC SQL, I will be really thankful. I am using SAS Enterprise Guide version 4.2 Thank you everyone, Kind regards, Luiz
... View more