Desktop productivity for business analysts and programmers

Using multiple tables to match and group data

Reply
Occasional Contributor
Posts: 9

Using multiple tables to match and group data

[ Edited ]

I have 3 tables, all with bits of into that I need to come together into one dataset. The variable "employeeID" links to a variable called "calcID", the calcID variable then links to a variable called "student_ID". Basically what this does reveals which students were included in a teachers evaluation measures (so teachers can have multiple calc id's, based on the grade_subject_id variable, but only one employee id).

 

How do I get SAS to look at each calcID from multiple tables to find a match (not necessarily on the same row) and then if there is a match, the coordinating employeeID and grade_subject_ID should be input for each student_ID?

 

If this isn't possible in SAS, is there a way to make this happen using SQL (either PROC SQL or SQL Server)?

 

Thank you in advance for your help!

 

PROC IMPORT DATAFILE= "C:\Users\egagne\Desktop\Spillover\Y16_GROW_Teacher BN, Employeeid, GRADE_SUBJECT_ID, linkB, subgroup_id, MGP, student_n, calcID, MGP, avg_prescore, target_population, HEDI, HEDI_points.csv"
OUT = Y16_GROW_Teacher replace; GETNAMES=Yes;
run;

PROC IMPORT DATAFILE="C:\Users\egagne\Desktop\Spillover\Y16_CALC_Student student id sgp calcid.csv"
out=Y16_CALC_Stud dbms=csv replace; GETNAMES=Yes;
run;

DATA Y16_CALC_Stud (RENAME=(StudCalcID = calcID));
SET Y16_CALC_Stud;
run;

PROC IMPORT DATAFILE="C:\Users\egagne\Desktop\Spillover\Y16_LINK_GM school_DBN empID grade_subject_id STUDENT_ID score.csv"
DBMS=csv OUT= Y16_LINK_GM replace; GETNAMES=NO;
run;

OPTIONS VALIDVARNAME=ANY OPTIONS SPOOL;
DATA Y16_LINK_GM (RENAME=(VAR1=school_DBN VAR2=employeeID VAR3=grade_subject_ID VAR4=student_ID VAR5=score));
SET Y16_LINK_GM;
run;

PROC SORT data=Y16_LINK_GM OUT=Y16_LINK_GM_SORT; BY DESCENDING employeeID student_ID;
run;

PROC SORT DATA=Y16_GROW_Teacher OUT=Y16_GROW_Teacher_SORT;
BY DESCENDING employeeID calcID;
run;

PROC SORT DATA=Y16_CALC_Stud OUT=Y16_CALC_Student_SORT;
BY DESCENDING calcID;
run;

DATA Y16_GROW_LINK_MERGE;
MERGE Y16_LINK_GM (in=x) Y16_GROW_TEACHER_SORT(in=y);
BY DESCENDING employeeID calcID;
run;

PROC SORT DATA=Y16_GROW_LINK_MERGE OUT=Y16_GROW_LINK_MERGE_SORT;
BY DESCENDING calcID;
run;

DATA Y16_CALC_GROW_LINK_MERGE;
MERGE Y16_CALC_Student_SORT (in=x) Y16_GROW_LINK_MERGE_SORT(in=y);
by descending calcID;
if x and y;
run;

PROC SORT DATA= Y16_CALC_GROW_LINK_MERGE OUT=SpilloverFinal;
BY DESCENDING student_ID;
run;

 

This is what the tables look like:

 

Y16_LINK_GM

DBN     employee_ID gradeu_subject_ID  student_ID   Score
00
M123 987654 ES_SCI4 123456789 45 00M123 654987 ES_MAT2 987456789 68 00X123 123456 ES_SCI4 567845895 87

Y16_GROW_Teacher

BN  employeeID  GRADE_SUBJECT   linkB   subgroup_ID MGP          student_n  calcID      avg_prescore    target_population   HEDI    HEDI_points
X123 123456     ES_SCI4          0       0          10.7000456    24        1191298.7   -15.12398714    Individual          H       88
K123 9852178 ES_MAT6 0 0 15.0569811 19 5698985.3 -23.74125892 Individual H 72
M123 9874562 ES_MAT2 0 0 56.0265487 16 8527412.3 -87.65478928 Individual H 94

Y16_CALC_Stud

Student_ID  StudSGP StudCalcID
369123661 15 6659.2 985432125 54 8527412.3 567845895 -2 1191298.7

 What I'd like to end up with (more or less):

student_ID Score StudSGP grade_subject_ID employee_ID HEDI HEDI_points DBN     avg_prescore
567845895   25   -2      ES_SCI4          123456      H    94          00X123  -15.12398714

 

Esteemed Advisor
Posts: 5,198

Re: Using multiple tables to match and group data

A few more lines of example data, and the desired result , would help to better describe your requirement.

SQL is usually the better tool when linking multiple tabels with different keys.

Data never sleeps
Occasional Contributor
Posts: 9

Re: Using multiple tables to match and group data

Thanks for the feedback.  I added a few more lines for each table and also an example of what I would like to end up with.

Grand Advisor
Posts: 10,229

Re: Using multiple tables to match and group data

Example data does not have LINK_GM information for student 567845895 and none of the employee_id values appear in both data sets. So it may be a tad difficult to test code against.

 

 

Occasional Contributor
Posts: 9

Re: Using multiple tables to match and group data

Fixed.

Grand Advisor
Posts: 10,229

Re: Using multiple tables to match and group data

I think the following code does what you are requesting. I had to make my own data sets so if you have any place where a variable that appears numeric but is character in one set and numeric in the other you may have to do your own conversion.

I don't use EG very much so I have no idea if the point and click will match things this way.

 

data work.Y16_LINK_GM;
   informat dbn $6. employee_ID grade_subject_ID student_ID $10.;
   input DBN     employee_ID grade_subject_ID  student_ID   Score;
datalines;
00M123  987654      ES_SCI4            123456789    45
00M123  654987      ES_MAT2            987456789    68 
00X123  123456      ES_SCI4            567845895    87
;
run;

data work.Y16_GROW_Teacher;
   informat  bn $6.   employeeID GRADE_SUBJECT $10.   linkB   subgroup_ID MGP          student_n  calcID      avg_prescore best16.      target_population $10. HEDI $1.  ;
   input BN  employeeID  GRADE_SUBJECT   linkB   subgroup_ID MGP          student_n  calcID      avg_prescore    target_population   HEDI    HEDI_points;
datalines;
X123 123456     ES_SCI4          0       0          10.7000456    24        1191298.7   -15.12398714    Individual          H       88
K123 9852178    ES_MAT6          0       0          15.0569811    19        5698985.3   -23.74125892    Individual          H       72
M123 9874562    ES_MAT2          0       0          56.0265487    16        8527412.3   -87.65478928    Individual          H       94
;
run;

data work.Y16_CALC_Stud;
   informat Student_ID $10.;
   input Student_ID  StudSGP StudCalcID;
datalines;
369123661   15      6659.2
985432125   54      8527412.3
567845895   -2      1191298.7
;
run;


proc sql;
   create table work.temp as 
   select c.Student_ID,c.score,c.StudSGP,c.grade_subject_ID,c.employee_ID,d.HEDI,d.HEDI_points,d.avg_prescore
   from (
         select a.Student_ID,a.StudSGP, StudCalcID, b.DBN, b.Score ,b.employee_ID, b.grade_subject_ID
         from  work.Y16_CALC_Stud as a left join work.Y16_LINK_GM as b
            on a.Student_ID = b.Student_ID
        ) as c   left join  work.Y16_GROW_Teacher as d on
        c.StudCalcID = d.calcID and c.employee_ID=d.employeeID and c.grade_subject_ID=d.GRADE_SUBJECT 
   ;
quit;

Occasional Contributor
Posts: 9

Re: Using multiple tables to match and group data

Thanks so much @ballardw.  I tried your code and am still getting a lot of errors.  Mostly 

ERROR: Unresolved reference to table/correlation name

I tried using the full table names instead of a., b., and so on but that doesn't make a difference.  Any ideas? 

Grand Advisor
Posts: 10,229

Re: Using multiple tables to match and group data

Post the log of the exact code run.

 

Note that with the nested select you really may want to use at least one set of aliases as that nest is not the same as the original data.

Ask a Question
Discussion stats
  • 7 replies
  • 525 views
  • 3 likes
  • 3 in conversation