BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
RobertWF1
Quartz | Level 8

My team is estimating average treatment effect on the treated (ATT) estimates for a health care program using nearest neighbor matching with proc psmatch.

 

Along with several demographic and health variables, we're matching treatments to controls by treatment month (month and year the member engaged in the health program), so that our dataset contains multiple member month records per member in the control group (for example, 201901, 201902, . . ., 202012).

 

In proc psmatch is there a way to limit how many times records sharing the same member ID are matched with different treatments? We don't want the bulk of matches coming from a small subset of members in the control group that are used over and over.

1 ACCEPTED SOLUTION

Accepted Solutions
quickbluefish
Obsidian | Level 7

Are you actually making use of the propensity score or are you just using PSMATCH as a way to streamline the matching?  There is (I'm almost certain) no way to do what you're asking using PSMATCH.  If you need propensity scores, I would suggest just outputting them (the predicted probabilities for each person / month) from a model you run in proc genmod or proc logistic, log transforming them and then using PROC SQL to create all possible matches (left joining the treated people to the controls on your various matching variables and within whatever caliper of the log-transformed ps scores you want to use, e.g., 0.25).  Since you have several matching variables and the caliper, this shouldn't result in something too gigantic.  For purpose of explanation, I'll call this dataset RAW_MATCH -- this will have separate columns for the treated patient ID and the control/month patient ID.

 

After that, you'd need to do something in a DATA step to achieve your goal (I am assuming you're matching without replacement, e.g., a Feb 2019 control record from person X cannot be matched to more than one treated person and that you're merely trying to limit the total number of times that _any_ record of person X is used as a match).  The simplest way of doing this might be, for each record in the RAW_MATCH dataset, create a random number (ideally by first writing a CALL STREAMINIT(xxx); statement in the data step, where xxx is some randomly chosen integer, and then the RAND() function to create a random number).  Then sort the dataset by the case patient ID variable and the random number you just created (i.e., within each case patient, sort by the random number).  Then, in another data step, simply take the first N control records for each case patient, e.g., to keep up to 4 controls per treated person:

data 
    treated (keep=ptID caseID ncontrols)
    controls (keep=controlID controlMonth caseID rename= 
        (controlID=ptID))
    ;
set raw_matches;
by ptID randnum;
length caseID 8 ncontrols 3;
retain caseID 0 ncontrols;
if first.ptID then do;
    caseID+1;
    ncontrols=0;
end;
if ncontrols<4 and not missing(controlID) then do;
    ncontrols+1;
    output controls;
end;
if last.ptID then output treated;
run;

data final_matches;
set
    treated (in=A)
    controls (in=B)
    ;
length isTreated 3;
isTreated=A;
run;

proc sort data=final_matches; by caseID descending isTreated; run;

 

There are more sophisticated things you could do, of course, but that might get you what you're after.

View solution in original post

3 REPLIES 3
quickbluefish
Obsidian | Level 7

Are you actually making use of the propensity score or are you just using PSMATCH as a way to streamline the matching?  There is (I'm almost certain) no way to do what you're asking using PSMATCH.  If you need propensity scores, I would suggest just outputting them (the predicted probabilities for each person / month) from a model you run in proc genmod or proc logistic, log transforming them and then using PROC SQL to create all possible matches (left joining the treated people to the controls on your various matching variables and within whatever caliper of the log-transformed ps scores you want to use, e.g., 0.25).  Since you have several matching variables and the caliper, this shouldn't result in something too gigantic.  For purpose of explanation, I'll call this dataset RAW_MATCH -- this will have separate columns for the treated patient ID and the control/month patient ID.

 

After that, you'd need to do something in a DATA step to achieve your goal (I am assuming you're matching without replacement, e.g., a Feb 2019 control record from person X cannot be matched to more than one treated person and that you're merely trying to limit the total number of times that _any_ record of person X is used as a match).  The simplest way of doing this might be, for each record in the RAW_MATCH dataset, create a random number (ideally by first writing a CALL STREAMINIT(xxx); statement in the data step, where xxx is some randomly chosen integer, and then the RAND() function to create a random number).  Then sort the dataset by the case patient ID variable and the random number you just created (i.e., within each case patient, sort by the random number).  Then, in another data step, simply take the first N control records for each case patient, e.g., to keep up to 4 controls per treated person:

data 
    treated (keep=ptID caseID ncontrols)
    controls (keep=controlID controlMonth caseID rename= 
        (controlID=ptID))
    ;
set raw_matches;
by ptID randnum;
length caseID 8 ncontrols 3;
retain caseID 0 ncontrols;
if first.ptID then do;
    caseID+1;
    ncontrols=0;
end;
if ncontrols<4 and not missing(controlID) then do;
    ncontrols+1;
    output controls;
end;
if last.ptID then output treated;
run;

data final_matches;
set
    treated (in=A)
    controls (in=B)
    ;
length isTreated 3;
isTreated=A;
run;

proc sort data=final_matches; by caseID descending isTreated; run;

 

There are more sophisticated things you could do, of course, but that might get you what you're after.

quickbluefish
Obsidian | Level 7
Haha, on 2nd thought, you might be able to achieve basically the same thing by first randomly sorting your input dataset and then running proc psmatch as you already were. Worth a try. But being able to do the matching 'by hand' does give you a lot more flexibility.
RobertWF1
Quartz | Level 8

I like this workaround.

 

I'll have to see if I end up with unmatched treatments, but my matching datasets have several thousand treatment records compared to several *million* controls so I suspect it'll be ok.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 701 views
  • 3 likes
  • 2 in conversation