BookmarkSubscribeRSS Feed
Simone98
Calcite | Level 5

Good afternoon, I have a large dataset of 6 million observations (containing both cases and controls), made in this way:

- uniqueid : the unique code of each patient;

- gender

- age

- casecontrol (1 for cases, 0 for controls).

The cases are about 400,000 and i wanted to create a 1:3 matching with the controls, based on gender (same gender) and age (range of plus / minus 3 years).

At the end i want to obtain a dataset like this:

- id_control

- gender_control

- age_control

- id_case

- gender_case

- age_case

 

I found a macro online, but no way to obtain the id_case... if someone can provide me a solution, it will be extremely helpful.

Thank you.

 

4 REPLIES 4
HB
Barite | Level 11 HB
Barite | Level 11
Not sure what you are really trying to accomplish.

Do you have 400,000 cases and 5,600,000 controls?

Are you wanting to create a dataset of about 1.6 million observations in which 25 percent of the records (400,000) are cases and 75 percent of the records (1.2 million) are controls selected to match the cases?

Simone98
Calcite | Level 5
Yes, i have a dataset with 5,600,000 controls and 400,000 cases, and I want to create a 1:3 matching in order to obtain a dataset (with 1,2 million observations) , where each row is made like this:
- id_control
- age_control
- sex_control
- id_case (the id of the case to which the control is matched)
- age_case
- sex_age

Thank you for your help!
Reeza
Super User
Have you looked at PROC PSMATCH?
mkeintz
PROC Star

1:3 matching with the controls, based on gender (same gender) and age (range of plus / minus 3 years).

So for each case record, there is a range of 7 ages that can be acceptable matches.  And vice-versa, each of those control matches can be matched to some case record having 7 possible ages (forgetting for the moment the matches near the upper and lower ages).

 

The problem here is, (assuming you are doing matching without replacement) that there can be distributions of ages that would only satisfy random case-control matching some (perhaps most) of the time, but not all of the time.  That is, some random draws of "matching" control ages for a given case age, could leave an insufficient sample of "matching" control ages for some other case age.   Yet some other random draws from the same data might satisfy your objective of 3 controls per case.

 

For instance, you can have

  1. Two case records with ages 21 (n=1) and 22 (n=1)
  2. Eight control records with ages   18 (n=1), 19 (n=1), 20 (n=1), 23 (n=1), 24 (n=1), and 25 (n=1).  I.e. no 21's or 22's.

You might randomly assign control records with ages 19, 20 and 23 for case age 21.  That would leave only two controls records (ages 24 and 25) as matchable against case age 22.  Yet there are a number of other random draws from this data that would provide 1:3 matches.

 

Now perhaps your data is not so pathologically distributed as to make random draws of matches more than infinitesimally likely to generate incomplete case-control matches

 

But it's possible.  If it happens, I guess you could rerun your random assignment with a different random-number seed.

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1044 views
  • 4 likes
  • 4 in conversation