Frankly, this problem is beyond my skill set, so I'm hoping for your help (and patience) as I struggle through this.
My PROC SQL step that joins these 2 tables:
proc sql;
create table join1 as select
a.*,t.absence_date, t.abs_count, t.abs_school, t.employee_name, t.employee_ssn from allsets as a
left join tchabs2 as t
on a.SSN=t.employee_SSN and a.schoolid=t.abs_school
order by SSN;
quit;
My objective: compare a teacher's absences (Absence_Date in TCHABS2) to a given interval (dateenrolled, dateleft in ALLSETS) and if the absence occurs within the interval, then count the absence. Otherwise, ignore it. However, when I join the tables I quickly see how the number of obs proliferates. If a teacher has 7 absences and her class roster has 50 students, then one teacher generates 350 obs.
Question: before I begin matching the two tables, should I consider using an array so that there is one record per teacher per school? I know enough about arrays to be dangerous.
I'm picturing the TCHABS2 table being transformed into something like so:
school_Id | Employee_SSN | Employee_Name | Job_Description | Absence_Date1 | Absence_Type1 | Absence_Date2 | Absence_Type2 | etc.. | Absence_Date7 | Absence_Type7 |
30 | 777 | Joni | Teacher Math | 7-Oct-14 | Full Day | 8-Oct-14 | Full Day | 9-Apr-15 | Half Day AM | |
26 | 777 | Joni | Teacher Math | 7-Oct-14 | Full Day | 8-Oct-14 | Full Day | 9-Apr-15 | Half Day AM |
TCHABS2
abs_school | Employee_SSN | Employee_Name | Job_Description | Absence_Date | Absence_Type |
30 | 777 | Joni | Teacher Math | 7-Oct-14 | Full Day |
30 | 777 | Joni | Teacher Math | 8-Oct-14 | Full Day |
30 | 777 | Joni | Teacher Math | 30-Oct-14 | Full Day |
30 | 777 | Joni | Teacher Math | 10-Feb-15 | Full Day |
30 | 777 | Joni | Teacher Math | 11-Feb-15 | Full Day |
30 | 777 | Joni | Teacher Math | 4-Mar-15 | Full Day |
30 | 777 | Joni | Teacher Math | 9-Apr-15 | Half Day AM |
26 | 777 | Joni | Teacher Math | 7-Oct-14 | Full Day |
26 | 777 | Joni | Teacher Math | 8-Oct-14 | Full Day |
26 | 777 | Joni | Teacher Math | 30-Oct-14 | Full Day |
26 | 777 | Joni | Teacher Math | 10-Feb-15 | Full Day |
26 | 777 | Joni | Teacher Math | 11-Feb-15 | Full Day |
26 | 777 | Joni | Teacher Math | 4-Mar-15 | Full Day |
26 | 777 | Joni | Teacher Math | 9-Apr-15 | Half Day AM |
ALLSETS
STUDENT | Dateenrolled | DateLeft | Schoolid | Course_Name | ssn |
able | 8/18/2014 | 10/14/2014 | 30 | Mathematics 7 | 777 |
baker | 8/18/2014 | 6/5/2015 | 30 | Mathematics 7 | 777 |
charlie | 8/18/2014 | 6/5/2015 | 30 | Mathematics 7 | 777 |
delta | 8/18/2014 | 6/5/2015 | 30 | Mathematics 7 | 777 |
echo | 8/18/2014 | 6/5/2015 | 30 | Mathematics 7 | 777 |
You probably do NOT want to transpose the data so that there are multiple date variables as any comparison logic gets very complicated very quickly.
If the problem is comparing the teacher with students and generating more records than desired you may want to summarize the STUDENT data down to a CLASS level before joining. If you need to know how many students are involved then you have what you need.
What is final report supposed to look like?
The final report:
I will sort it by school, then teacher, then class.
STUDENT_NAME SCHOOLID DATEENROLLED DATELEFT DAYS_ENROLLED NUMBER_OF_TEACHER_ABSENCES NET_DAYS_ENROLLED (by subtracting teacher absences from days enrolled, only counting those that fell between dateenrolled and dateleft) PERCENT_OF_YEAR_ENROLLED (by dividing the net days enrolled by days enrolled)
Essentially, I'm wanting to show each teacher what percent of the school year a student was in her class. One of the business rules says that you can subtract the number of days that a teacher was absent from the number of days a student was enrolled to get the percent of the year enrolled. A school year = 180 days. Say that a student was enrolled for 100 days. Further, during those 100 days the teacher was absent 10 of those days. The net_days_enrolled is 100 - 10. The percent_of_year_enrolled would be (100 - 10)/180 = 50%. The teacher has to log into a portal and populate this percentage for each student she teaches in each of her classes. One data set I have contains all student info, including class information. The other data set contains teacher absence dates. The SSN and schoolid appear in both sets. This is what I'm matching on.
Since you are reporting on student/class combinations then you don't want to try to reduce the number of results as you need each one.
If you haven't already, look into Proc Report as it will allow you to make these calculations and generate report tables at the same time. That assumes your dates are SAS date values, then number of days enrolled is date_left - dateenrolled. These should be doeable in Compute blocks.
If I understand what you're saying, I'm really looking at student/teacher combinations. I will look into PROC REPORT. I have seen code involving a compute block but have never actually tried to use it.
Could be a case for a hash iterator, where you walk through the teacher absences and output if absence_date is between date_enrolled and date_left ..
I'll have to research how to use a hash iterator. I've never used it before. Thanks.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.