I have 2 groups of patients that I am matching. One of the matching criteria is medication history range (rx start and end dates) within 5 years. I'm at a loss on how to code this because of all the possible combinations. I want to assign them a score so that the closer the range the lower the score. The farther the range the higher the score. Or, I could also do if the duration is within 5 years then assign 1, otherwise assign 0. However, I prefer the first method.
data groupA; input pat_id $ rx_start_date:MMDDYY10. rx_end_date:MMDDYY10. ; format rx_start_date MMDDYY10. rx_end_date MMDDYY10.; datalines; A1 1/17/2018 12/25/2018 A2 5/31/2018 11/6/2018 A3 3/7/2018 8/27/2018 A4 2/26/2014 11/22/2014 A5 5/4/2019 7/29/2020 A6 2/7/2016 5/4/2018 A7 4/26/2019 5/27/2019 A8 4/19/2007 5/11/2017 A9 10/8/2008 3/14/2009 A10 9/11/2007 1/26/2015 ; data groupB; input pat_id $ rx_start_date:MMDDYY10. rx_end_date:MMDDYY10. ; format rx_start_date MMDDYY10. rx_end_date MMDDYY10.; datalines; B1 4/18/2018 6/19/2021 B2 2/15/2008 8/2/2019 B3 2/7/2020 7/12/2020 B4 10/24/2004 5/7/2010 B5 11/20/2003 6/5/2008 B6 4/6/2016 5/17/2016 B7 11/10/2015 2/10/2021 B8 1/9/2015 8/22/2016 B9 8/1/2007 9/23/2009 B10 6/18/2006 8/29/2017 B11 8/28/2002 5/15/2021 B12 7/2/2009 8/6/2009 B13 8/23/2015 12/7/2019 B14 12/12/2011 7/8/2013 B15 6/11/2013 5/14/2016 B16 6/2/2018 7/13/2019 B17 6/6/2013 10/24/2016 B18 9/22/2004 6/16/2020 B19 1/9/2007 8/23/2018 B20 4/23/2013 6/29/2018 ;
You haven't explained your matching criteria fully. Is it group A matching with group B or is the reverse (B matching with A) or is it both?
Are you after a one-to-one match, maybe the closest, or are one-to-many matches allowed?
One method that would be useful in deciding the closest match would be the number of days the date ranges actually overlap, then if they don't overlap how close the non-overlapping dates are.
Thank you for the reply. Group B is the control and the larger group, so I believe that would be matching group A with group B. I am doing a 1:5 match. I am matching on sex, ethnicity, and age within 3 years in addition to Rx duration. However, I can figure out the code for the other 3. It's the date ranges that I haven't gotten yet. I do like your idea about number of days that overlap.
It sounds like you are looking to quantify the "closeness" of the start and end dates of a given record in one set to all the records in the other, correct? If so, you could create a score using the DATDIF function to get the difference, and use the absolute value to determine how close they are. The sum of the absolute value of the differences could be the overall score for closeness.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.