Solved: Re: Simple random matching wo replacement by sex, age, and range of in...

lei · Posted 05-23-2018 10:09 PM

Hi,

I have a question regarding how to match without replacement by sex, age, and range of index dates.

I have two datasets: controls and cases.

I have used proc surveyselect method outlined by Diseker to match by age and sex. However, I would also like the index date of the controls to fall within a year of the controls. Is there an easy way to do this for 1:4 or 1:3 matching.

Thanks.

PGStats · Posted 05-24-2018 12:48 AM

This is my first attempt to do this with PSMATCH. It seems to work:

/* Divide SASHELP.HEART patients into two roles: Case and Control 
   with 4 times as many controls as cases. Create variables ageGroup 
   and a dummy propensity score (required but won't be used. Look at
   option DISTANCE=(MAH(VAR=)) in MATCH statement of PROC PSMATCH)  */
data heart;
call streaminit(9798979);
set sashelp.heart;
if rand("uniform") > 0.2
    then role = "Control"; 
    else role = "Case";
ageGroup = round(ageAtStart, 5);
dumPS = rand('uniform');
run;

/* Match each Case with two Controls by Sex and ageGroup with
   closely matched Weights. 
   Create variable ID for each Case-Control group, within each 
   sex-ageGroup byGroup*/
proc sort data=heart; by sex ageGroup role; run;

proc psmatch data=heart ;
where status = "Alive";
by sex ageGroup;
class role;
psdata treatvar=role(treated='Case') ps=dumPS;
output out(obs=all)=heartMatches matchId=Id;
match 
    caliper=.  
    distance=mah(var=(weight)/cov=identity)
    method=greedy(k=2);
run;

/* Count the matches */
proc sql;
select sex, ageGroup, 
sum(role="Case"    and not missing(id)) as matchedCases,
sum(role="Case"    and missing(id))     as unMatchedCases,
sum(role="Control" and not missing(id)) as matchedControls,
sum(role="Control" and missing(id))     as unMatchedControls,
count(distinct id)                      as nbPairs
from heartMatches
group by sex, ageGroup;
quit;

/* Look at the distribution of matched weight differences */
proc sql;
create table weightDiffs as
select
    a.sex,
    a.ageGroup,
    a.id,
    a.weight as caseWeight,
    b.weight as controlWeight,
    a.weight-b.weight as weightDiff
from 
    heartMatches as a inner join
    heartMatches as b 
        on a.id=b.id and a.sex=b.sex and a.ageGroup=b.ageGroup
where a.role="Case" and b.role="Control";
quit;

proc univariate data=weightDiffs;
var weightDiff;
histogram;
run;

PG

View solution in original post

PGStats · Posted 05-24-2018 12:48 AM

This is my first attempt to do this with PSMATCH. It seems to work:

/* Divide SASHELP.HEART patients into two roles: Case and Control 
   with 4 times as many controls as cases. Create variables ageGroup 
   and a dummy propensity score (required but won't be used. Look at
   option DISTANCE=(MAH(VAR=)) in MATCH statement of PROC PSMATCH)  */
data heart;
call streaminit(9798979);
set sashelp.heart;
if rand("uniform") > 0.2
    then role = "Control"; 
    else role = "Case";
ageGroup = round(ageAtStart, 5);
dumPS = rand('uniform');
run;

/* Match each Case with two Controls by Sex and ageGroup with
   closely matched Weights. 
   Create variable ID for each Case-Control group, within each 
   sex-ageGroup byGroup*/
proc sort data=heart; by sex ageGroup role; run;

proc psmatch data=heart ;
where status = "Alive";
by sex ageGroup;
class role;
psdata treatvar=role(treated='Case') ps=dumPS;
output out(obs=all)=heartMatches matchId=Id;
match 
    caliper=.  
    distance=mah(var=(weight)/cov=identity)
    method=greedy(k=2);
run;

/* Count the matches */
proc sql;
select sex, ageGroup, 
sum(role="Case"    and not missing(id)) as matchedCases,
sum(role="Case"    and missing(id))     as unMatchedCases,
sum(role="Control" and not missing(id)) as matchedControls,
sum(role="Control" and missing(id))     as unMatchedControls,
count(distinct id)                      as nbPairs
from heartMatches
group by sex, ageGroup;
quit;

/* Look at the distribution of matched weight differences */
proc sql;
create table weightDiffs as
select
    a.sex,
    a.ageGroup,
    a.id,
    a.weight as caseWeight,
    b.weight as controlWeight,
    a.weight-b.weight as weightDiff
from 
    heartMatches as a inner join
    heartMatches as b 
        on a.id=b.id and a.sex=b.sex and a.ageGroup=b.ageGroup
where a.role="Case" and b.role="Control";
quit;

proc univariate data=weightDiffs;
var weightDiff;
histogram;
run;

PG

lei · Posted 05-24-2018 09:13 AM

Hi,

Maybe it's too early in the morning. What about the index date range? Thanks.

PGStats · Posted 05-25-2018 12:04 AM

In this example, I use weight instead of index date as the fuzzy matching criterion. I try to match two controls per case and then require a weight difference <= 2 lbs. This version might be more informative about the performance of the procedure:

/* Divide SASHELP.HEART patients into two roles: Case and Control 
   with 4 times as many controls as cases. Create variables patientId,
   ageGroup and a dummy propensity score (required but won't be used. 
   Look at option DISTANCE=(MAH(VAR=)) in MATCH statement of 
   PROC PSMATCH)  */
data heart;
call streaminit(9798979);
set sashelp.heart;
if rand("uniform") > 0.2
    then role = "Control"; 
    else role = "Case";
ageGroup = round(ageAtStart, 5);
patientId = _n_;
dumPS = rand('uniform');
run;

/* Match each Case with two Controls by Sex and ageGroup with
   closely matched Weights. 
   Create variable ID for each Case-Control group, within each 
   sex-ageGroup byGroup*/
proc sort data=heart; by sex ageGroup role; run;

proc psmatch data=heart ;
where status = "Alive";
by sex ageGroup;
class role;
psdata treatvar=role(treated='Case') ps=dumPS;
output out(obs=all)=heartMatches matchId=Id;
match 
    caliper=.  
    distance=mah(var=(weight)/cov=identity)
    method=greedy(k=2);
run;

/* look at match statistics */
proc sort data=heartMatches; by sex ageGroup id role; run; 

data heartTable;
ctrlNo = 0;
do until (last.id);
    set heartMatches; by sex ageGroup id;
    where id is not missing;
    if role = "Case" then do;
        caseId = patientId;
        caseWeight = weight;
        end;
    else do;
        ctrlNo + 1;
        controlId = patientId;
        controlWeight = weight;
        weightDiff = caseWeight - controlWeight;
        /* if abs(weightDiff) <= 2 then */
        output;
        end;
    end;
keep sex ageGroup id caseId ctrlNo controlId caseWeight controlWeight weightDiff;
run;

title "Matched Cases and Controls with weigth difference <= 2 lb";
proc sql;
select
    sex,
    ageGroup,
    count(distinct caseId) as nbMatchedCases,
    count(distinct controlId) as nbMatchedControls,
    calculated nbMatchedControls / calculated nbMatchedCases 
        as meanNbControlsPerCase format 4.2
from heartTable
where abs(weightDiff) <= 2
group by sex, ageGroup;
quit;

                                                                 meanNb
                                  nbMatched     nbMatched      Controls
             Sex     ageGroup         Cases      Controls       PerCase
             ----------------------------------------------------------
             Female        30            46            83          1.80
             Female        35            99           191          1.93
             Female        40            70           137          1.96
             Female        45            70           138          1.97
             Female        50            51            99          1.94
             Female        55            32            63          1.97
             Female        60             8            10          1.25
             Male          30            40            74          1.85
             Male          35            71           139          1.96
             Male          40            45            84          1.87
             Male          45            39            76          1.95
             Male          50            22            42          1.91
             Male          55            11            20          1.82
             Male          60             1             1          1.00

PG

Haris · Posted 06-11-2019 01:27 PM

@PGStats,

Neat. I am curious, what is the advantage of processing exact matches with the BY processing--why not EXACT= option? Would love to hear your reasoning about the pros and cons of

BY Sex AgeGroup

versus

EXACT=(Sex AgeGroup)

.

Simple random matching wo replacement by sex, age, and range of index date for case control studies

Re: Simple random matching wo replacement by sex, age, and range of index date for case control stud

Re: Simple random matching wo replacement by sex, age, and range of index date for case control stud

Re: Simple random matching wo replacement by sex, age, and range of index date for case control stud

Re: Simple random matching wo replacement by sex, age, and range of index date for case control stud

Re: Simple random matching wo replacement by sex, age, and range of index date for case control stud