Hi,
I have a question regarding how to match without replacement by sex, age, and range of index dates.
I have two datasets: controls and cases.
I have used proc surveyselect method outlined by Diseker to match by age and sex. However, I would also like the index date of the controls to fall within a year of the controls. Is there an easy way to do this for 1:4 or 1:3 matching.
Thanks.
This is my first attempt to do this with PSMATCH. It seems to work:
/* Divide SASHELP.HEART patients into two roles: Case and Control
with 4 times as many controls as cases. Create variables ageGroup
and a dummy propensity score (required but won't be used. Look at
option DISTANCE=(MAH(VAR=)) in MATCH statement of PROC PSMATCH) */
data heart;
call streaminit(9798979);
set sashelp.heart;
if rand("uniform") > 0.2
then role = "Control";
else role = "Case";
ageGroup = round(ageAtStart, 5);
dumPS = rand('uniform');
run;
/* Match each Case with two Controls by Sex and ageGroup with
closely matched Weights.
Create variable ID for each Case-Control group, within each
sex-ageGroup byGroup*/
proc sort data=heart; by sex ageGroup role; run;
proc psmatch data=heart ;
where status = "Alive";
by sex ageGroup;
class role;
psdata treatvar=role(treated='Case') ps=dumPS;
output out(obs=all)=heartMatches matchId=Id;
match
caliper=.
distance=mah(var=(weight)/cov=identity)
method=greedy(k=2);
run;
/* Count the matches */
proc sql;
select sex, ageGroup,
sum(role="Case" and not missing(id)) as matchedCases,
sum(role="Case" and missing(id)) as unMatchedCases,
sum(role="Control" and not missing(id)) as matchedControls,
sum(role="Control" and missing(id)) as unMatchedControls,
count(distinct id) as nbPairs
from heartMatches
group by sex, ageGroup;
quit;
/* Look at the distribution of matched weight differences */
proc sql;
create table weightDiffs as
select
a.sex,
a.ageGroup,
a.id,
a.weight as caseWeight,
b.weight as controlWeight,
a.weight-b.weight as weightDiff
from
heartMatches as a inner join
heartMatches as b
on a.id=b.id and a.sex=b.sex and a.ageGroup=b.ageGroup
where a.role="Case" and b.role="Control";
quit;
proc univariate data=weightDiffs;
var weightDiff;
histogram;
run;
This is my first attempt to do this with PSMATCH. It seems to work:
/* Divide SASHELP.HEART patients into two roles: Case and Control
with 4 times as many controls as cases. Create variables ageGroup
and a dummy propensity score (required but won't be used. Look at
option DISTANCE=(MAH(VAR=)) in MATCH statement of PROC PSMATCH) */
data heart;
call streaminit(9798979);
set sashelp.heart;
if rand("uniform") > 0.2
then role = "Control";
else role = "Case";
ageGroup = round(ageAtStart, 5);
dumPS = rand('uniform');
run;
/* Match each Case with two Controls by Sex and ageGroup with
closely matched Weights.
Create variable ID for each Case-Control group, within each
sex-ageGroup byGroup*/
proc sort data=heart; by sex ageGroup role; run;
proc psmatch data=heart ;
where status = "Alive";
by sex ageGroup;
class role;
psdata treatvar=role(treated='Case') ps=dumPS;
output out(obs=all)=heartMatches matchId=Id;
match
caliper=.
distance=mah(var=(weight)/cov=identity)
method=greedy(k=2);
run;
/* Count the matches */
proc sql;
select sex, ageGroup,
sum(role="Case" and not missing(id)) as matchedCases,
sum(role="Case" and missing(id)) as unMatchedCases,
sum(role="Control" and not missing(id)) as matchedControls,
sum(role="Control" and missing(id)) as unMatchedControls,
count(distinct id) as nbPairs
from heartMatches
group by sex, ageGroup;
quit;
/* Look at the distribution of matched weight differences */
proc sql;
create table weightDiffs as
select
a.sex,
a.ageGroup,
a.id,
a.weight as caseWeight,
b.weight as controlWeight,
a.weight-b.weight as weightDiff
from
heartMatches as a inner join
heartMatches as b
on a.id=b.id and a.sex=b.sex and a.ageGroup=b.ageGroup
where a.role="Case" and b.role="Control";
quit;
proc univariate data=weightDiffs;
var weightDiff;
histogram;
run;
Hi,
Maybe it's too early in the morning. What about the index date range? Thanks.
In this example, I use weight instead of index date as the fuzzy matching criterion. I try to match two controls per case and then require a weight difference <= 2 lbs. This version might be more informative about the performance of the procedure:
/* Divide SASHELP.HEART patients into two roles: Case and Control
with 4 times as many controls as cases. Create variables patientId,
ageGroup and a dummy propensity score (required but won't be used.
Look at option DISTANCE=(MAH(VAR=)) in MATCH statement of
PROC PSMATCH) */
data heart;
call streaminit(9798979);
set sashelp.heart;
if rand("uniform") > 0.2
then role = "Control";
else role = "Case";
ageGroup = round(ageAtStart, 5);
patientId = _n_;
dumPS = rand('uniform');
run;
/* Match each Case with two Controls by Sex and ageGroup with
closely matched Weights.
Create variable ID for each Case-Control group, within each
sex-ageGroup byGroup*/
proc sort data=heart; by sex ageGroup role; run;
proc psmatch data=heart ;
where status = "Alive";
by sex ageGroup;
class role;
psdata treatvar=role(treated='Case') ps=dumPS;
output out(obs=all)=heartMatches matchId=Id;
match
caliper=.
distance=mah(var=(weight)/cov=identity)
method=greedy(k=2);
run;
/* look at match statistics */
proc sort data=heartMatches; by sex ageGroup id role; run;
data heartTable;
ctrlNo = 0;
do until (last.id);
set heartMatches; by sex ageGroup id;
where id is not missing;
if role = "Case" then do;
caseId = patientId;
caseWeight = weight;
end;
else do;
ctrlNo + 1;
controlId = patientId;
controlWeight = weight;
weightDiff = caseWeight - controlWeight;
/* if abs(weightDiff) <= 2 then */
output;
end;
end;
keep sex ageGroup id caseId ctrlNo controlId caseWeight controlWeight weightDiff;
run;
title "Matched Cases and Controls with weigth difference <= 2 lb";
proc sql;
select
sex,
ageGroup,
count(distinct caseId) as nbMatchedCases,
count(distinct controlId) as nbMatchedControls,
calculated nbMatchedControls / calculated nbMatchedCases
as meanNbControlsPerCase format 4.2
from heartTable
where abs(weightDiff) <= 2
group by sex, ageGroup;
quit;
meanNb nbMatched nbMatched Controls Sex ageGroup Cases Controls PerCase ---------------------------------------------------------- Female 30 46 83 1.80 Female 35 99 191 1.93 Female 40 70 137 1.96 Female 45 70 138 1.97 Female 50 51 99 1.94 Female 55 32 63 1.97 Female 60 8 10 1.25 Male 30 40 74 1.85 Male 35 71 139 1.96 Male 40 45 84 1.87 Male 45 39 76 1.95 Male 50 22 42 1.91 Male 55 11 20 1.82 Male 60 1 1 1.00
@PGStats,
Neat. I am curious, what is the advantage of processing exact matches with the BY processing--why not EXACT= option? Would love to hear your reasoning about the pros and cons of
BY Sex AgeGroup
versus
EXACT=(Sex AgeGroup)
.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.