Re: De-duplicating controls in matched case/control study

shanj · Posted 08-11-2025 07:48 PM

Hi,

I used SAS a code published here to match cases and controls using the "Linear Assignment - graph directed method" (published in 2020) - please see the code attached below.

This code worked perfectly with my dataset and I could select controls on 1:5 with no duplicate controls.

However, a researcher I worked with checked the code and mentioned that there is a possibility to select the controls "non-randomly" from this method.

So, I want to know if this method selects the controls randomly or not. Could you please check the code and confirm to me if the controls are randomly selected in this method as random selection of controls is encouraged in this project?

Thank you so much.

The Code:

data have;

input case_id control_id;

datalines;

1 10

1 125

1 1850

1 928

3 125

3 8

3 1276

3 1011

3 10

4 10

4 1011

;

data links1;

set have;

rename case_id=from control_id=to;

weight = 1;

run;

proc optnet data_links=links1 GRAPH_DIRECTION=DIRECTED;

linear_assignment out=want1;

run;

proc print data=want1; run;

proc sql;

create table links2 as

select * from links1

where to not in (select to from want1);

quit;

proc optnet data_links=links2 GRAPH_DIRECTION=DIRECTED;

linear_assignment out=want2;

run;

proc print data=want2; run;

proc sql;

create table links3 as

select * from links2

where to not in (select to from want2);

quit;

proc optnet data_links=links3 GRAPH_DIRECTION=DIRECTED;

linear_assignment out=want3;

run;

proc print data=want3; run;

data want;

set want1 want2 want3;

rename from=case_id to=control_id;

drop weight;

run;

proc sort data=want; by case_id control_id; run;

proc print data=want noobs; run;

quickbluefish · Posted 08-11-2025 09:59 PM

I am definitely not familiar with this 'graph directed' method. Not really clear what variables you're matching on here. Re: randomness, how did you create the 'have' dataset in the first place? Were the controls randomly sorted and then you just created a cartesian product of the cases and controls?

shanj · Posted 08-12-2025 03:32 AM

Hi,

Thank you for your reply. This is the link to the original post - https://communities.sas.com/t5/SAS-Programming/De-duplicating-controls-in-matched-case-control/td-p/...

I created the 'have' dataset by matching on two variables - age and ethnicity.

I did not randomly select the controls as the dataset to select controls in this study was very small.

Your help is much appreciated.

Thanks.

Ksharp · Posted 08-12-2025 04:52 AM

Assuming I understood your question.

Try this one:

data have;
input case_id  control_id;
datalines;
  1         10
  1        125
  1     1850
  1       928
  3       125
  3          8
  3     1276
  3      1011
  3         10
  4         10
  4      1011
;


%let n=3;  *each case_id has 3 control_id at most;

data have2;
 set have;
 call streaminit(1234);
 random=rand('uniform');
run;
proc sort data=have2;
by case_id random;
run;
data want;
if _n_=1 then do;
 if 0 then set have;
 declare hash h();
 h.definekey('control_id');
 h.definedone();
end;
set have2;
by case_id;
if first.case_id then count=0;
if h.check() ne 0 and count<&n. then do;
  count+1;
  h.add();
  output;
end;
run;

RobPratt · Posted 08-12-2025 12:44 PM

You might be interested in PROC PSMATCH, which uses a linear assignment solver under the hood for optimal matching:

https://go.documentation.sas.com/doc/en/pgmsascdc/v_065/statug/statug_psmatch_toc.htm

De-duplicating controls in matched case/control study