Solved: Re: proc psmatch question: Possible to limit number of matches to same...

RobertWF1 · Posted 07-04-2022 11:56 PM

My team is estimating average treatment effect on the treated (ATT) estimates for a health care program using nearest neighbor matching with proc psmatch.

Along with several demographic and health variables, we're matching treatments to controls by treatment month (month and year the member engaged in the health program), so that our dataset contains multiple member month records per member in the control group (for example, 201901, 201902, . . ., 202012).

In proc psmatch is there a way to limit how many times records sharing the same member ID are matched with different treatments? We don't want the bulk of matches coming from a small subset of members in the control group that are used over and over.

quickbluefish · Posted 07-06-2022 12:22 PM

Are you actually making use of the propensity score or are you just using PSMATCH as a way to streamline the matching? There is (I'm almost certain) no way to do what you're asking using PSMATCH. If you need propensity scores, I would suggest just outputting them (the predicted probabilities for each person / month) from a model you run in proc genmod or proc logistic, log transforming them and then using PROC SQL to create all possible matches (left joining the treated people to the controls on your various matching variables and within whatever caliper of the log-transformed ps scores you want to use, e.g., 0.25). Since you have several matching variables and the caliper, this shouldn't result in something too gigantic. For purpose of explanation, I'll call this dataset RAW_MATCH -- this will have separate columns for the treated patient ID and the control/month patient ID.

After that, you'd need to do something in a DATA step to achieve your goal (I am assuming you're matching without replacement, e.g., a Feb 2019 control record from person X cannot be matched to more than one treated person and that you're merely trying to limit the total number of times that _any_ record of person X is used as a match). The simplest way of doing this might be, for each record in the RAW_MATCH dataset, create a random number (ideally by first writing a CALL STREAMINIT(xxx); statement in the data step, where xxx is some randomly chosen integer, and then the RAND() function to create a random number). Then sort the dataset by the case patient ID variable and the random number you just created (i.e., within each case patient, sort by the random number). Then, in another data step, simply take the first N control records for each case patient, e.g., to keep up to 4 controls per treated person:

data 
    treated (keep=ptID caseID ncontrols)
    controls (keep=controlID controlMonth caseID rename= 
        (controlID=ptID))
    ;
set raw_matches;
by ptID randnum;
length caseID 8 ncontrols 3;
retain caseID 0 ncontrols;
if first.ptID then do;
    caseID+1;
    ncontrols=0;
end;
if ncontrols<4 and not missing(controlID) then do;
    ncontrols+1;
    output controls;
end;
if last.ptID then output treated;
run;

data final_matches;
set
    treated (in=A)
    controls (in=B)
    ;
length isTreated 3;
isTreated=A;
run;

proc sort data=final_matches; by caseID descending isTreated; run;

There are more sophisticated things you could do, of course, but that might get you what you're after.

View solution in original post

quickbluefish · Posted 07-06-2022 12:22 PM

Are you actually making use of the propensity score or are you just using PSMATCH as a way to streamline the matching? There is (I'm almost certain) no way to do what you're asking using PSMATCH. If you need propensity scores, I would suggest just outputting them (the predicted probabilities for each person / month) from a model you run in proc genmod or proc logistic, log transforming them and then using PROC SQL to create all possible matches (left joining the treated people to the controls on your various matching variables and within whatever caliper of the log-transformed ps scores you want to use, e.g., 0.25). Since you have several matching variables and the caliper, this shouldn't result in something too gigantic. For purpose of explanation, I'll call this dataset RAW_MATCH -- this will have separate columns for the treated patient ID and the control/month patient ID.

After that, you'd need to do something in a DATA step to achieve your goal (I am assuming you're matching without replacement, e.g., a Feb 2019 control record from person X cannot be matched to more than one treated person and that you're merely trying to limit the total number of times that _any_ record of person X is used as a match). The simplest way of doing this might be, for each record in the RAW_MATCH dataset, create a random number (ideally by first writing a CALL STREAMINIT(xxx); statement in the data step, where xxx is some randomly chosen integer, and then the RAND() function to create a random number). Then sort the dataset by the case patient ID variable and the random number you just created (i.e., within each case patient, sort by the random number). Then, in another data step, simply take the first N control records for each case patient, e.g., to keep up to 4 controls per treated person:

data 
    treated (keep=ptID caseID ncontrols)
    controls (keep=controlID controlMonth caseID rename= 
        (controlID=ptID))
    ;
set raw_matches;
by ptID randnum;
length caseID 8 ncontrols 3;
retain caseID 0 ncontrols;
if first.ptID then do;
    caseID+1;
    ncontrols=0;
end;
if ncontrols<4 and not missing(controlID) then do;
    ncontrols+1;
    output controls;
end;
if last.ptID then output treated;
run;

data final_matches;
set
    treated (in=A)
    controls (in=B)
    ;
length isTreated 3;
isTreated=A;
run;

proc sort data=final_matches; by caseID descending isTreated; run;

There are more sophisticated things you could do, of course, but that might get you what you're after.

quickbluefish · Posted 07-06-2022 12:31 PM

Haha, on 2nd thought, you might be able to achieve basically the same thing by first randomly sorting your input dataset and then running proc psmatch as you already were. Worth a try. But being able to do the matching 'by hand' does give you a lot more flexibility.

RobertWF1 · Posted 07-06-2022 03:05 PM

I like this workaround.

I'll have to see if I end up with unmatched treatments, but my matching datasets have several thousand treatment records compared to several *million* controls so I suspect it'll be ok.

proc psmatch question: Possible to limit number of matches to same member ID in control group?

Re: proc psmatch question: Possible to limit number of matches to same member ID in control group?

Re: proc psmatch question: Possible to limit number of matches to same member ID in control group?

Re: proc psmatch question: Possible to limit number of matches to same member ID in control group?

Re: proc psmatch question: Possible to limit number of matches to same member ID in control group?

proc psmatch question: Possible to limit number of matches to same member ID in control group?

Re: proc psmatch question: Possible to limit number of matches to same member ID in control group?

Re: proc psmatch question: Possible to limit number of matches to same member ID in control group?

Re: proc psmatch question: Possible to limit number of matches to same member ID in control group?

Re: proc psmatch question: Possible to limit number of matches to same member ID in control group?

The 2025 SAS Hackathon has begun!