09-22-2016 12:00 PM - edited 09-23-2016 12:49 PM
I have 3 tables, all with bits of into that I need to come together into one dataset. The variable "employeeID" links to a variable called "calcID", the calcID variable then links to a variable called "student_ID". Basically what this does reveals which students were included in a teachers evaluation measures (so teachers can have multiple calc id's, based on the grade_subject_id variable, but only one employee id).
How do I get SAS to look at each calcID from multiple tables to find a match (not necessarily on the same row) and then if there is a match, the coordinating employeeID and grade_subject_ID should be input for each student_ID?
If this isn't possible in SAS, is there a way to make this happen using SQL (either PROC SQL or SQL Server)?
Thank you in advance for your help!
PROC IMPORT DATAFILE= "C:\Users\egagne\Desktop\Spillover\Y16_GROW_Teacher BN, Employeeid, GRADE_SUBJECT_ID, linkB, subgroup_id, MGP, student_n, calcID, MGP, avg_prescore, target_population, HEDI, HEDI_points.csv"
OUT = Y16_GROW_Teacher replace; GETNAMES=Yes;
PROC IMPORT DATAFILE="C:\Users\egagne\Desktop\Spillover\Y16_CALC_Student student id sgp calcid.csv"
out=Y16_CALC_Stud dbms=csv replace; GETNAMES=Yes;
DATA Y16_CALC_Stud (RENAME=(StudCalcID = calcID));
PROC IMPORT DATAFILE="C:\Users\egagne\Desktop\Spillover\Y16_LINK_GM school_DBN empID grade_subject_id STUDENT_ID score.csv"
DBMS=csv OUT= Y16_LINK_GM replace; GETNAMES=NO;
OPTIONS VALIDVARNAME=ANY OPTIONS SPOOL;
DATA Y16_LINK_GM (RENAME=(VAR1=school_DBN VAR2=employeeID VAR3=grade_subject_ID VAR4=student_ID VAR5=score));
PROC SORT data=Y16_LINK_GM OUT=Y16_LINK_GM_SORT; BY DESCENDING employeeID student_ID;
PROC SORT DATA=Y16_GROW_Teacher OUT=Y16_GROW_Teacher_SORT;
BY DESCENDING employeeID calcID;
PROC SORT DATA=Y16_CALC_Stud OUT=Y16_CALC_Student_SORT;
BY DESCENDING calcID;
MERGE Y16_LINK_GM (in=x) Y16_GROW_TEACHER_SORT(in=y);
BY DESCENDING employeeID calcID;
PROC SORT DATA=Y16_GROW_LINK_MERGE OUT=Y16_GROW_LINK_MERGE_SORT;
BY DESCENDING calcID;
MERGE Y16_CALC_Student_SORT (in=x) Y16_GROW_LINK_MERGE_SORT(in=y);
by descending calcID;
if x and y;
PROC SORT DATA= Y16_CALC_GROW_LINK_MERGE OUT=SpilloverFinal;
BY DESCENDING student_ID;
This is what the tables look like:
DBN employee_ID gradeu_subject_ID student_ID Score
00M123 987654 ES_SCI4 123456789 45 00M123 654987 ES_MAT2 987456789 68 00X123 123456 ES_SCI4 567845895 87
BN employeeID GRADE_SUBJECT linkB subgroup_ID MGP student_n calcID avg_prescore target_population HEDI HEDI_points X123 123456 ES_SCI4 0 0 10.7000456 24 1191298.7 -15.12398714 Individual H 88
K123 9852178 ES_MAT6 0 0 15.0569811 19 5698985.3 -23.74125892 Individual H 72
M123 9874562 ES_MAT2 0 0 56.0265487 16 8527412.3 -87.65478928 Individual H 94
Student_ID StudSGP StudCalcID
369123661 15 6659.2 985432125 54 8527412.3 567845895 -2 1191298.7
What I'd like to end up with (more or less):
student_ID Score StudSGP grade_subject_ID employee_ID HEDI HEDI_points DBN avg_prescore 567845895 25 -2 ES_SCI4 123456 H 94 00X123 -15.12398714
09-22-2016 11:50 PM
A few more lines of example data, and the desired result , would help to better describe your requirement.
SQL is usually the better tool when linking multiple tabels with different keys.
09-23-2016 11:57 AM
Example data does not have LINK_GM information for student 567845895 and none of the employee_id values appear in both data sets. So it may be a tad difficult to test code against.
09-23-2016 04:54 PM
I think the following code does what you are requesting. I had to make my own data sets so if you have any place where a variable that appears numeric but is character in one set and numeric in the other you may have to do your own conversion.
I don't use EG very much so I have no idea if the point and click will match things this way.
data work.Y16_LINK_GM; informat dbn $6. employee_ID grade_subject_ID student_ID $10.; input DBN employee_ID grade_subject_ID student_ID Score; datalines; 00M123 987654 ES_SCI4 123456789 45 00M123 654987 ES_MAT2 987456789 68 00X123 123456 ES_SCI4 567845895 87 ; run; data work.Y16_GROW_Teacher; informat bn $6. employeeID GRADE_SUBJECT $10. linkB subgroup_ID MGP student_n calcID avg_prescore best16. target_population $10. HEDI $1. ; input BN employeeID GRADE_SUBJECT linkB subgroup_ID MGP student_n calcID avg_prescore target_population HEDI HEDI_points; datalines; X123 123456 ES_SCI4 0 0 10.7000456 24 1191298.7 -15.12398714 Individual H 88 K123 9852178 ES_MAT6 0 0 15.0569811 19 5698985.3 -23.74125892 Individual H 72 M123 9874562 ES_MAT2 0 0 56.0265487 16 8527412.3 -87.65478928 Individual H 94 ; run; data work.Y16_CALC_Stud; informat Student_ID $10.; input Student_ID StudSGP StudCalcID; datalines; 369123661 15 6659.2 985432125 54 8527412.3 567845895 -2 1191298.7 ; run; proc sql; create table work.temp as select c.Student_ID,c.score,c.StudSGP,c.grade_subject_ID,c.employee_ID,d.HEDI,d.HEDI_points,d.avg_prescore from ( select a.Student_ID,a.StudSGP, StudCalcID, b.DBN, b.Score ,b.employee_ID, b.grade_subject_ID from work.Y16_CALC_Stud as a left join work.Y16_LINK_GM as b on a.Student_ID = b.Student_ID ) as c left join work.Y16_GROW_Teacher as d on c.StudCalcID = d.calcID and c.employee_ID=d.employeeID and c.grade_subject_ID=d.GRADE_SUBJECT ; quit;
09-26-2016 12:16 PM
Thanks so much @ballardw. I tried your code and am still getting a lot of errors. Mostly
ERROR: Unresolved reference to table/correlation name
I tried using the full table names instead of a., b., and so on but that doesn't make a difference. Any ideas?
09-26-2016 12:24 PM
Post the log of the exact code run.
Note that with the nested select you really may want to use at least one set of aliases as that nest is not the same as the original data.