Hi, I have a table with a list of records that are linked to a group ID. Each record has a start date and an end date. Each record is active during this date range: Group ID Days Within Range for Group ID Record Start Date End Date A 5 1 1 5 A 5 2 1 5 B 10 3 1 5 B 10 4 6 10 I am trying to calculate the average number of records that are active for a group ID on a given date e.g for group ID A this would be 2 (as both records are active on every day within the group ID range). For B this would be 1 as all days within the range have an active record, and neither one of the records overlap. Here is the approach that I have taken: 1 - Create a table listing all the dates within the total range of all group IDs data alldates ; do Date = '03jan2010'd to '03jan2015'd ; output ; end ; format Date date9. ; run ; 2 - Joining this with the record table. I matched on the group ID and where the date is within the Start Date and End date for each record. This then created a table listing all of the active days for each record i.e. a record active for 5 days will have 5 rows in this table proc sql; create table record_active_day_summary as select * from alldates as a inner join record_info as b on a.group=b.group where a.Date between b.Start_Date and b.End_Date quit; 3 - Summarize this table by the average number of times each day appears for each group ID. This gives the final output that I'm looking for. proc sql; create table group_summary as Select distinct Group count(Date)/count(distinct Date) as Avg_Records_Per_Day from record_active_day_summary quit; Does anybody have an recommendations on how I can speed this up? Ideally I would like to figure out a way to cut out Step 2 - this is an intermediary creating a huge data set for me (around 95 million rows). Thanks in advance!
... View more