Matching many to one based on scores

elbarto · Posted 03-11-2021 05:45 PM

I have the following dataset

DATA have;
  input period	key_id	treat	ps_score
;
DATALINES;
59	1004	0	0.0701784726
59	1078	0	0.1496074832
59	1209	0	0.1325333248
59	1300	0	0.1555762808
59	1327	0	0.0469939511
59	1523	1	0.0531098455
59	1854	1	0.0411252176
61	1004	0	0.1085249132
61	1008	0	0.1531924709
61	1078	0	0.0963164678
61	1102	0	0.0962037147
61	1300	0	0.0684734650
61	2402	0	0.1030826279
61	3023	1	0.0242288199
61	3044	1	0.0848487298
61	4033	1	0.0024468050
;
RUN;

First, I wish to do one to one matching as follows.: Match each key_id that has treat = 1 to a key_id with treat = 0 in the same period group where their absolute difference in ps_score is the smallest. This produces the following dataset:

DATA want1;
  input period	key_id	matched_id;
DATALINES;
59	1523	1327
59	1854	1327
61	3023	1300
61	3044	1102
61	4033	1300

;
RUN;

For example, in period=59, for key_id=1523 (which has treat=1), the absolute difference in ps_score with key_id=1327 (which has treat=0 and is in the same period) is 0.006115894, which is the smallest. So in the want1 dataset, the matched_id for key_id=1523 is 1327. Other entries follow the same rule.

What I want to do next is 2-1 matching, so that each treat=1 is matched to the closest two key_id with treat=0 in the same period with the closest ps_score. The resultant dataset is as follows:

DATA want2;
  input period	key_id	matched_id;
DATALINES;
59	1523	1327
59	1523	1004
59	1854	1327
59	1854	1004
61	3023	1300
61	3023	1102
61	3044	1102
61	3044	1078
61	4033	1300
61	4033	1102
;
RUN;

For example, for key_id=1523 in period=59, after it is matched to 1327, we look for the key_id with treat=0 that has the next smallest absolute difference in terms of ps_score. This corresponds to key_id=1004 (the absolute difference in ps_score is 0.017068627). So, in this 2-1 match, 1523 is matched with two observations.

Is it possible to write a general code for this so that I can choose N:1 match (following the same rules as above) where I have illustrated the cases for N=1, 2 above.

Reeza · Posted 03-11-2021 07:04 PM

Sure, how did you write your original code? It's likely easiest to modify the code you already have.
The simplest way is to add a distance counter to the closest and then just filter based on the distance counter, not sure what approach you're using but that is how I'd do it.
https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

Otherwise you may also want to take a look at PROC PSMATCH which does propensity matching and you can specify the N:M there as an option.

Matching many to one based on scores

Re: Matching many to one based on scores

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away