BookmarkSubscribeRSS Feed
GreggB
Pyrite | Level 9

Frankly, this problem is beyond my skill set, so I'm hoping for your help (and patience) as I struggle through this.

My PROC SQL step that joins these 2 tables:

proc sql;

create table join1 as select

a.*,t.absence_date, t.abs_count, t.abs_school, t.employee_name, t.employee_ssn from allsets as a

left join tchabs2 as t

on a.SSN=t.employee_SSN and a.schoolid=t.abs_school

order by SSN;

quit;

My objective: compare a teacher's absences (Absence_Date in TCHABS2) to a given interval (dateenrolled, dateleft in ALLSETS) and if the absence occurs within the interval, then count the absence.  Otherwise, ignore it.  However, when I join the tables I quickly see how the number of obs proliferates. If a teacher has 7 absences and her class roster has 50 students, then one teacher generates 350 obs.

Question: before I begin matching the two tables, should I consider using an array so that there is one record per teacher per school?  I know enough about arrays to be dangerous.

I'm picturing the TCHABS2 table being transformed into something like so:

school_IdEmployee_SSNEmployee_NameJob_DescriptionAbsence_Date1Absence_Type1Absence_Date2Absence_Type2etc..Absence_Date7Absence_Type7
30777 JoniTeacher Math7-Oct-14Full Day8-Oct-14Full Day9-Apr-15Half Day AM
26777JoniTeacher Math7-Oct-14Full Day8-Oct-14Full Day9-Apr-15Half Day AM

TCHABS2

abs_schoolEmployee_SSNEmployee_NameJob_DescriptionAbsence_DateAbsence_Type
30777 JoniTeacher Math7-Oct-14Full Day
30777 JoniTeacher Math8-Oct-14Full Day
30777 JoniTeacher Math30-Oct-14Full Day
30777 JoniTeacher Math10-Feb-15Full Day
30777 JoniTeacher Math11-Feb-15Full Day
30777 JoniTeacher Math4-Mar-15Full Day
30777 JoniTeacher Math9-Apr-15Half Day AM
26777 JoniTeacher Math7-Oct-14Full Day
26777 JoniTeacher Math8-Oct-14Full Day
26777 JoniTeacher Math30-Oct-14Full Day
26777 JoniTeacher Math10-Feb-15Full Day
26777 JoniTeacher Math11-Feb-15Full Day
26777 JoniTeacher Math4-Mar-15Full Day
26777 JoniTeacher Math9-Apr-15Half Day AM

ALLSETS

STUDENTDateenrolledDateLeftSchoolidCourse_Namessn
able8/18/201410/14/201430Mathematics 7777
baker8/18/20146/5/201530Mathematics 7777
charlie8/18/20146/5/201530Mathematics 7777
delta8/18/20146/5/201530Mathematics 7777
echo8/18/20146/5/201530Mathematics 7777
6 REPLIES 6
ballardw
Super User

You probably do NOT want to transpose the data so that there are multiple date variables as any comparison logic gets very complicated very quickly.

If the problem is comparing the teacher with students and generating more records than desired you may want to summarize the STUDENT data down to a CLASS level before joining. If you need to know how many students are involved then you have what you need.

What is final report supposed to look like?

GreggB
Pyrite | Level 9

The final report:

I will sort it by school, then teacher, then class.

STUDENT_NAME        SCHOOLID   DATEENROLLED      DATELEFT     DAYS_ENROLLED     NUMBER_OF_TEACHER_ABSENCES     NET_DAYS_ENROLLED (by subtracting teacher absences from days enrolled, only counting those that fell between dateenrolled and dateleft)   PERCENT_OF_YEAR_ENROLLED (by dividing the net days enrolled by days enrolled)

Essentially, I'm wanting to show each teacher what percent of the school year a student was in her class.  One of the business rules says that you can subtract the number of days that a teacher was absent from the number of days a student was enrolled to get the percent of the year enrolled.  A school year = 180 days.  Say that a student was enrolled for 100 days.  Further, during those 100 days the teacher was absent 10 of those days.  The net_days_enrolled is 100 - 10.  The percent_of_year_enrolled would be (100 - 10)/180 = 50%.  The teacher has to log into a portal and populate this percentage for each student she teaches in each of her classes.  One data set I have contains all student info, including class information.  The other data set contains teacher absence dates.  The SSN and schoolid appear in both sets.  This is what I'm matching on.

ballardw
Super User

Since you are reporting on student/class combinations then you don't want to try to reduce the number of results as you need each one.

If you haven't already, look into Proc Report as it will allow you to make these calculations and generate report tables at the same time. That assumes your dates are SAS date values, then number of days enrolled is date_left - dateenrolled. These should be doeable in Compute blocks.

GreggB
Pyrite | Level 9

If I understand what you're saying, I'm really looking at student/teacher combinations.  I will look into PROC REPORT.  I have seen code involving a compute block but have never actually tried to use it.

user24feb
Barite | Level 11

Could be a case for a hash iterator, where you walk through the teacher absences and output if absence_date is between date_enrolled and date_left ..

GreggB
Pyrite | Level 9

I'll have to research how to use a hash iterator.  I've never used it before.  Thanks.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 885 views
  • 0 likes
  • 3 in conversation