10-19-2017 05:16 PM
I've come across this task using two dataset, I've simplified the issues in the two input tables:
What I want to figure out is:
a. # of pregnancy episodes of women whose pregnancy start and end date both in 2015 and who also has no more than 45 days continuous enrollment gap in 2015(one single gap > 45 days counts as a gap, not accumulated gap)
b. # of pregnancy episodes of women whose has insurance from the 30 days prior to their pregnancy end towards their pregnancy end, and whose pregnancy end date is in year 2015, same 45 day gap rule.
First table has pregnancy start and end date (pregnancy_ID = 2 means this is this women's 2nd pregnancy episode)
second table has insurance start and end date
For task a, I can easily identify the first part, whose pregnancy start and end both in 2015, but the second part gets tricky. As shown in the second table, a women may have multiple insurance enrollment entries (they enrolled and dropped and re-enrolled), and merging both tables create a multiple to multiple merge, kind complex the problem....
For task b, it's easy to identify if they have insurance at the exact day of the 30 days prior, but I can't find a way to ensure the entire period is continuous enrolled.....
10-20-2017 10:05 AM
thank you for asking @ChrisNZ, the desired output is just two data point
- # of women in task a.
- # of women in task b.
The reason why I didn't list a desired table is that I am not sure what is the best way to count it, to merge two datasets and then do it or satisfy the conditions separately in each dataset and then merge or never merge. The pregnancy dataset has more than 5000 observations and the enrollment dataset has something like 6000 (because one person can have multiple rows/multiple enrollment and drop enrollment), thus I feel a many to many merge will complex the problem.