Hi,
I need to do risk set matching on my dataset.
The cases are women whom become pregnant at eg. day X after diagnosis. The controls are the entire population and a woman can be a control if not yet pregnant at day X.
Does anyone know where to download syntax for a macro or anything that can be of use. Besides the time variable I have an already calculated PS score to match on. I would prefer to do it with replacement.
You might look into PROC PSMATCH to see if it can do what you need.
Hello,
Have you tried to program it yourself with a data step (or inside a macro)?
If yes, you can post the code in this thread (even when it's entirely wrong). People (community members) will notice / discover what you try to do and will correct your statements in a quick and efficient way.
What you can also do: post a little program where you create an extract of the cases dataset and an extract of the controls data set (use a datastep with datalines). Indicate how a possible outcome dataset would look like and community members will help you to go from the dataset(s) you have to the dataset you want.
You can search the web (or this communities site) for SAS-papers on risk set matching but you will do a lot of reading as a result and nothing will exactly be what you want.
Kind regards,
Koen
Hello,
I haven't looked at the paper you have provided (yet), but here's another paper that might help:
Paper 152-30 (SAS Users Group International 30 in Philadelphia)
SAS® Programs to Select Controls for Matched Case-Control Studies
Robert Matthews and Ilene Brill
University of Alabama at Birmingham, Birmingham, AL
https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/152-30.pdf
However if you are not good at coding it can become problematic to interpret and grasp all the code provided there.
Also, the code is sub-optimal (especially for big data sets). I suggested a quicker solution (and it was accepted as a solution) by means of hash-tables in this communities entry:
You will find even more communities entries on this topic if you supply 'risk set' as keywords to search on.
I agree 'risk set matching' / 'risk set sampling' is a common problem that needs to be 'solved' very often but a macro that covers all possible use cases / flavors would have to be very generic and therefore very lengthy. The data step is so very agile and can be so easily 'tweaked' to your personal requirements that your code can be very brief.
Good luck,
Koen
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.