My team is estimating average treatment effect on the treated (ATT) estimates for a health care program using nearest neighbor matching with proc psmatch.
Along with several demographic and health variables, we're matching treatments to controls by treatment month (month and year the member engaged in the health program), so that our dataset contains multiple member month records per member in the control group (for example, 201901, 201902, . . ., 202012).
In proc psmatch is there a way to limit how many times records sharing the same member ID are matched with different treatments? We don't want the bulk of matches coming from a small subset of members in the control group that are used over and over.
Are you actually making use of the propensity score or are you just using PSMATCH as a way to streamline the matching? There is (I'm almost certain) no way to do what you're asking using PSMATCH. If you need propensity scores, I would suggest just outputting them (the predicted probabilities for each person / month) from a model you run in proc genmod or proc logistic, log transforming them and then using PROC SQL to create all possible matches (left joining the treated people to the controls on your various matching variables and within whatever caliper of the log-transformed ps scores you want to use, e.g., 0.25). Since you have several matching variables and the caliper, this shouldn't result in something too gigantic. For purpose of explanation, I'll call this dataset RAW_MATCH -- this will have separate columns for the treated patient ID and the control/month patient ID.
After that, you'd need to do something in a DATA step to achieve your goal (I am assuming you're matching without replacement, e.g., a Feb 2019 control record from person X cannot be matched to more than one treated person and that you're merely trying to limit the total number of times that _any_ record of person X is used as a match). The simplest way of doing this might be, for each record in the RAW_MATCH dataset, create a random number (ideally by first writing a CALL STREAMINIT(xxx); statement in the data step, where xxx is some randomly chosen integer, and then the RAND() function to create a random number). Then sort the dataset by the case patient ID variable and the random number you just created (i.e., within each case patient, sort by the random number). Then, in another data step, simply take the first N control records for each case patient, e.g., to keep up to 4 controls per treated person:
data
treated (keep=ptID caseID ncontrols)
controls (keep=controlID controlMonth caseID rename=
(controlID=ptID))
;
set raw_matches;
by ptID randnum;
length caseID 8 ncontrols 3;
retain caseID 0 ncontrols;
if first.ptID then do;
caseID+1;
ncontrols=0;
end;
if ncontrols<4 and not missing(controlID) then do;
ncontrols+1;
output controls;
end;
if last.ptID then output treated;
run;
data final_matches;
set
treated (in=A)
controls (in=B)
;
length isTreated 3;
isTreated=A;
run;
proc sort data=final_matches; by caseID descending isTreated; run;
There are more sophisticated things you could do, of course, but that might get you what you're after.
Are you actually making use of the propensity score or are you just using PSMATCH as a way to streamline the matching? There is (I'm almost certain) no way to do what you're asking using PSMATCH. If you need propensity scores, I would suggest just outputting them (the predicted probabilities for each person / month) from a model you run in proc genmod or proc logistic, log transforming them and then using PROC SQL to create all possible matches (left joining the treated people to the controls on your various matching variables and within whatever caliper of the log-transformed ps scores you want to use, e.g., 0.25). Since you have several matching variables and the caliper, this shouldn't result in something too gigantic. For purpose of explanation, I'll call this dataset RAW_MATCH -- this will have separate columns for the treated patient ID and the control/month patient ID.
After that, you'd need to do something in a DATA step to achieve your goal (I am assuming you're matching without replacement, e.g., a Feb 2019 control record from person X cannot be matched to more than one treated person and that you're merely trying to limit the total number of times that _any_ record of person X is used as a match). The simplest way of doing this might be, for each record in the RAW_MATCH dataset, create a random number (ideally by first writing a CALL STREAMINIT(xxx); statement in the data step, where xxx is some randomly chosen integer, and then the RAND() function to create a random number). Then sort the dataset by the case patient ID variable and the random number you just created (i.e., within each case patient, sort by the random number). Then, in another data step, simply take the first N control records for each case patient, e.g., to keep up to 4 controls per treated person:
data
treated (keep=ptID caseID ncontrols)
controls (keep=controlID controlMonth caseID rename=
(controlID=ptID))
;
set raw_matches;
by ptID randnum;
length caseID 8 ncontrols 3;
retain caseID 0 ncontrols;
if first.ptID then do;
caseID+1;
ncontrols=0;
end;
if ncontrols<4 and not missing(controlID) then do;
ncontrols+1;
output controls;
end;
if last.ptID then output treated;
run;
data final_matches;
set
treated (in=A)
controls (in=B)
;
length isTreated 3;
isTreated=A;
run;
proc sort data=final_matches; by caseID descending isTreated; run;
There are more sophisticated things you could do, of course, but that might get you what you're after.
I like this workaround.
I'll have to see if I end up with unmatched treatments, but my matching datasets have several thousand treatment records compared to several *million* controls so I suspect it'll be ok.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.