I've come across this task using two dataset, I've simplified the issues in the two input tables:
What I want to figure out is:
a. # of pregnancy episodes of women whose pregnancy start and end date both in 2015 and who also has no more than 45 days continuous enrollment gap in 2015(one single gap > 45 days counts as a gap, not accumulated gap)
b. # of pregnancy episodes of women whose has insurance from the 30 days prior to their pregnancy end towards their pregnancy end, and whose pregnancy end date is in year 2015, same 45 day gap rule.
First table has pregnancy start and end date (pregnancy_ID = 2 means this is this women's 2nd pregnancy episode)
PersonID | Pregnancy_start | Pregnancy_end | Pregnancy_ID |
1 | 4/16/2014 | 1/31/2015 | 1 |
1 | 6/1/2015 | 12/21/2015 | 2 |
2 | 2/12/2015 | 11/31/2015 | 1 |
3 | 1/1/2015 | 8/16/2015 | 1 |
4 | 3/1/2015 | 11/23/2015 | 1 |
5 | 2/3/2015 | 6/7/2015 | 1 |
5 | 10/12/2015 | 7/31/2016 | 2 |
6 | 4/1/2015 | 12/19/2015 | 1 |
second table has insurance start and end date
PersonID | Insurance_start | Insurance_end |
1 | 1/1/2014 | 1/6/2014 |
1 | 9/6/2014 | 5/3/2015 |
1 | 5/3/2015 | 1/1/2017 |
2 | 1/1/2014 | 10/1/2015 |
3 | 1/1/2014 | 12/31/2016 |
4 | 5/1/2014 | 3/6/2015 |
4 | 5/1/2015 | 12/31/2016 |
5 | 1/1/2014 | 1/1/2017 |
6 | 5/1/2014 | 11/31/2015 |
For task a, I can easily identify the first part, whose pregnancy start and end both in 2015, but the second part gets tricky. As shown in the second table, a women may have multiple insurance enrollment entries (they enrolled and dropped and re-enrolled), and merging both tables create a multiple to multiple merge, kind complex the problem....
For task b, it's easy to identify if they have insurance at the exact day of the 30 days prior, but I can't find a way to ensure the entire period is continuous enrolled.....
Thank you!
Please provide the expected output (and double check it is correct).
thank you for asking @ChrisNZ, the desired output is just two data point
- # of women in task a.
- # of women in task b.
The reason why I didn't list a desired table is that I am not sure what is the best way to count it, to merge two datasets and then do it or satisfy the conditions separately in each dataset and then merge or never merge. The pregnancy dataset has more than 5000 observations and the enrollment dataset has something like 6000 (because one person can have multiple rows/multiple enrollment and drop enrollment), thus I feel a many to many merge will complex the problem.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.