BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lei
Obsidian | Level 7 lei
Obsidian | Level 7

Hi,

 

I have a question regarding how to match without replacement by sex, age, and range of index dates.

 

I have two datasets: controls and cases.

 

I have used proc surveyselect method outlined by Diseker to match by age and sex. However, I would also like the index date of the controls to fall within a year of the controls. Is there an easy way to do this for 1:4 or 1:3 matching.

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

This is my first attempt to do this with PSMATCH. It seems to work:

 

/* Divide SASHELP.HEART patients into two roles: Case and Control 
   with 4 times as many controls as cases. Create variables ageGroup 
   and a dummy propensity score (required but won't be used. Look at
   option DISTANCE=(MAH(VAR=)) in MATCH statement of PROC PSMATCH)  */
data heart;
call streaminit(9798979);
set sashelp.heart;
if rand("uniform") > 0.2
    then role = "Control"; 
    else role = "Case";
ageGroup = round(ageAtStart, 5);
dumPS = rand('uniform');
run;

/* Match each Case with two Controls by Sex and ageGroup with
   closely matched Weights. 
   Create variable ID for each Case-Control group, within each 
   sex-ageGroup byGroup*/
proc sort data=heart; by sex ageGroup role; run;

proc psmatch data=heart ;
where status = "Alive";
by sex ageGroup;
class role;
psdata treatvar=role(treated='Case') ps=dumPS;
output out(obs=all)=heartMatches matchId=Id;
match 
    caliper=.  
    distance=mah(var=(weight)/cov=identity)
    method=greedy(k=2);
run;

/* Count the matches */
proc sql;
select sex, ageGroup, 
sum(role="Case"    and not missing(id)) as matchedCases,
sum(role="Case"    and missing(id))     as unMatchedCases,
sum(role="Control" and not missing(id)) as matchedControls,
sum(role="Control" and missing(id))     as unMatchedControls,
count(distinct id)                      as nbPairs
from heartMatches
group by sex, ageGroup;
quit;

/* Look at the distribution of matched weight differences */
proc sql;
create table weightDiffs as
select
    a.sex,
    a.ageGroup,
    a.id,
    a.weight as caseWeight,
    b.weight as controlWeight,
    a.weight-b.weight as weightDiff
from 
    heartMatches as a inner join
    heartMatches as b 
        on a.id=b.id and a.sex=b.sex and a.ageGroup=b.ageGroup
where a.role="Case" and b.role="Control";
quit;

proc univariate data=weightDiffs;
var weightDiff;
histogram;
run;
PG

View solution in original post

4 REPLIES 4
PGStats
Opal | Level 21

This is my first attempt to do this with PSMATCH. It seems to work:

 

/* Divide SASHELP.HEART patients into two roles: Case and Control 
   with 4 times as many controls as cases. Create variables ageGroup 
   and a dummy propensity score (required but won't be used. Look at
   option DISTANCE=(MAH(VAR=)) in MATCH statement of PROC PSMATCH)  */
data heart;
call streaminit(9798979);
set sashelp.heart;
if rand("uniform") > 0.2
    then role = "Control"; 
    else role = "Case";
ageGroup = round(ageAtStart, 5);
dumPS = rand('uniform');
run;

/* Match each Case with two Controls by Sex and ageGroup with
   closely matched Weights. 
   Create variable ID for each Case-Control group, within each 
   sex-ageGroup byGroup*/
proc sort data=heart; by sex ageGroup role; run;

proc psmatch data=heart ;
where status = "Alive";
by sex ageGroup;
class role;
psdata treatvar=role(treated='Case') ps=dumPS;
output out(obs=all)=heartMatches matchId=Id;
match 
    caliper=.  
    distance=mah(var=(weight)/cov=identity)
    method=greedy(k=2);
run;

/* Count the matches */
proc sql;
select sex, ageGroup, 
sum(role="Case"    and not missing(id)) as matchedCases,
sum(role="Case"    and missing(id))     as unMatchedCases,
sum(role="Control" and not missing(id)) as matchedControls,
sum(role="Control" and missing(id))     as unMatchedControls,
count(distinct id)                      as nbPairs
from heartMatches
group by sex, ageGroup;
quit;

/* Look at the distribution of matched weight differences */
proc sql;
create table weightDiffs as
select
    a.sex,
    a.ageGroup,
    a.id,
    a.weight as caseWeight,
    b.weight as controlWeight,
    a.weight-b.weight as weightDiff
from 
    heartMatches as a inner join
    heartMatches as b 
        on a.id=b.id and a.sex=b.sex and a.ageGroup=b.ageGroup
where a.role="Case" and b.role="Control";
quit;

proc univariate data=weightDiffs;
var weightDiff;
histogram;
run;
PG
lei
Obsidian | Level 7 lei
Obsidian | Level 7

Hi,

 

Maybe it's too early in the morning. What about the index date range? Thanks.

PGStats
Opal | Level 21

In this example, I use weight instead of index date as the fuzzy matching criterion. I try to match two controls per case and then require a weight difference <= 2 lbs. This version might be more informative about the performance of the procedure:

 

/* Divide SASHELP.HEART patients into two roles: Case and Control 
   with 4 times as many controls as cases. Create variables patientId,
   ageGroup and a dummy propensity score (required but won't be used. 
   Look at option DISTANCE=(MAH(VAR=)) in MATCH statement of 
   PROC PSMATCH)  */
data heart;
call streaminit(9798979);
set sashelp.heart;
if rand("uniform") > 0.2
    then role = "Control"; 
    else role = "Case";
ageGroup = round(ageAtStart, 5);
patientId = _n_;
dumPS = rand('uniform');
run;

/* Match each Case with two Controls by Sex and ageGroup with
   closely matched Weights. 
   Create variable ID for each Case-Control group, within each 
   sex-ageGroup byGroup*/
proc sort data=heart; by sex ageGroup role; run;

proc psmatch data=heart ;
where status = "Alive";
by sex ageGroup;
class role;
psdata treatvar=role(treated='Case') ps=dumPS;
output out(obs=all)=heartMatches matchId=Id;
match 
    caliper=.  
    distance=mah(var=(weight)/cov=identity)
    method=greedy(k=2);
run;

/* look at match statistics */
proc sort data=heartMatches; by sex ageGroup id role; run; 

data heartTable;
ctrlNo = 0;
do until (last.id);
    set heartMatches; by sex ageGroup id;
    where id is not missing;
    if role = "Case" then do;
        caseId = patientId;
        caseWeight = weight;
        end;
    else do;
        ctrlNo + 1;
        controlId = patientId;
        controlWeight = weight;
        weightDiff = caseWeight - controlWeight;
        /* if abs(weightDiff) <= 2 then */
        output;
        end;
    end;
keep sex ageGroup id caseId ctrlNo controlId caseWeight controlWeight weightDiff;
run;

title "Matched Cases and Controls with weigth difference <= 2 lb";
proc sql;
select
    sex,
    ageGroup,
    count(distinct caseId) as nbMatchedCases,
    count(distinct controlId) as nbMatchedControls,
    calculated nbMatchedControls / calculated nbMatchedCases 
        as meanNbControlsPerCase format 4.2
from heartTable
where abs(weightDiff) <= 2
group by sex, ageGroup;
quit;
                                                                 meanNb
                                  nbMatched     nbMatched      Controls
             Sex     ageGroup         Cases      Controls       PerCase
             ----------------------------------------------------------
             Female        30            46            83          1.80
             Female        35            99           191          1.93
             Female        40            70           137          1.96
             Female        45            70           138          1.97
             Female        50            51            99          1.94
             Female        55            32            63          1.97
             Female        60             8            10          1.25
             Male          30            40            74          1.85
             Male          35            71           139          1.96
             Male          40            45            84          1.87
             Male          45            39            76          1.95
             Male          50            22            42          1.91
             Male          55            11            20          1.82
             Male          60             1             1          1.00
PG
Haris
Lapis Lazuli | Level 10

@PGStats,

Neat. I am curious, what is the advantage of processing exact matches with the BY processing--why not EXACT= option? Would love to hear your reasoning about the pros and cons of 

BY Sex AgeGroup

versus

EXACT=(Sex AgeGroup)

.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 3866 views
  • 4 likes
  • 3 in conversation