Good afternoon, I have a large dataset of 6 million observations (containing both cases and controls), made in this way:
- uniqueid : the unique code of each patient;
- gender
- age
- casecontrol (1 for cases, 0 for controls).
The cases are about 400,000 and i wanted to create a 1:3 matching with the controls, based on gender (same gender) and age (range of plus / minus 3 years).
At the end i want to obtain a dataset like this:
- id_control
- gender_control
- age_control
- id_case
- gender_case
- age_case
I found a macro online, but no way to obtain the id_case... if someone can provide me a solution, it will be extremely helpful.
Thank you.
1:3 matching with the controls, based on gender (same gender) and age (range of plus / minus 3 years).
So for each case record, there is a range of 7 ages that can be acceptable matches. And vice-versa, each of those control matches can be matched to some case record having 7 possible ages (forgetting for the moment the matches near the upper and lower ages).
The problem here is, (assuming you are doing matching without replacement) that there can be distributions of ages that would only satisfy random case-control matching some (perhaps most) of the time, but not all of the time. That is, some random draws of "matching" control ages for a given case age, could leave an insufficient sample of "matching" control ages for some other case age. Yet some other random draws from the same data might satisfy your objective of 3 controls per case.
For instance, you can have
You might randomly assign control records with ages 19, 20 and 23 for case age 21. That would leave only two controls records (ages 24 and 25) as matchable against case age 22. Yet there are a number of other random draws from this data that would provide 1:3 matches.
Now perhaps your data is not so pathologically distributed as to make random draws of matches more than infinitesimally likely to generate incomplete case-control matches
But it's possible. If it happens, I guess you could rerun your random assignment with a different random-number seed.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.