BookmarkSubscribeRSS Feed
shanj
Calcite | Level 5

Hi,

I used SAS a code published here to match cases and controls using the "Linear Assignment - graph directed method" (published in 2020) - please see the code attached below. 

This code worked perfectly with my dataset and I could select controls on 1:5 with no duplicate controls. 

However, a researcher I worked with checked the code and mentioned that there is a possibility to select the controls "non-randomly" from this method. 

So, I want to know if this method selects the controls randomly or not.  Could you please check the code and confirm to me if the controls are randomly selected in this method as random selection of controls is encouraged in this project? 

Thank you so much.

 

The Code:

 

data have;

input case_id  control_id;

datalines;

  1         10

  1        125

  1     1850

  1       928

  3       125

  3          8

  3     1276

  3      1011

  3         10

  4         10

  4      1011

;

 

data links1;

set have;

rename case_id=from control_id=to;

weight = 1;

run;

 

proc optnet data_links=links1 GRAPH_DIRECTION=DIRECTED;

linear_assignment out=want1;

run;

 

proc print data=want1; run;

 

proc sql;

create table links2 as

select * from links1

where to not in (select to from want1);

quit;

 

proc optnet data_links=links2 GRAPH_DIRECTION=DIRECTED;

linear_assignment out=want2;

run;

 

proc print data=want2; run;

 

proc sql;

create table links3 as

select * from links2

where to not in (select to from want2);

quit;

 

proc optnet data_links=links3 GRAPH_DIRECTION=DIRECTED;

linear_assignment out=want3;

run;

 

proc print data=want3; run;

 

data want;

set want1 want2 want3;

rename from=case_id to=control_id;

drop weight;

run;

 

proc sort data=want; by case_id control_id; run;

 

proc print data=want noobs; run;

 

4 REPLIES 4
quickbluefish
Barite | Level 11
I am definitely not familiar with this 'graph directed' method. Not really clear what variables you're matching on here. Re: randomness, how did you create the 'have' dataset in the first place? Were the controls randomly sorted and then you just created a cartesian product of the cases and controls?
shanj
Calcite | Level 5

Hi, 

Thank you for your reply. This is the link to the original post - https://communities.sas.com/t5/SAS-Programming/De-duplicating-controls-in-matched-case-control/td-p/...

 

I created the 'have' dataset by matching on two variables - age and ethnicity.

I did not randomly select the controls as the dataset to select controls in this study was very small. 

Your help is much appreciated.

Thanks. 

Ksharp
Super User

Assuming I understood your question.

Try this one:

 

data have;
input case_id  control_id;
datalines;
  1         10
  1        125
  1     1850
  1       928
  3       125
  3          8
  3     1276
  3      1011
  3         10
  4         10
  4      1011
;


%let n=3;  *each case_id has 3 control_id at most;

data have2;
 set have;
 call streaminit(1234);
 random=rand('uniform');
run;
proc sort data=have2;
by case_id random;
run;
data want;
if _n_=1 then do;
 if 0 then set have;
 declare hash h();
 h.definekey('control_id');
 h.definedone();
end;
set have2;
by case_id;
if first.case_id then count=0;
if h.check() ne 0 and count<&n. then do;
  count+1;
  h.add();
  output;
end;
run;
RobPratt
SAS Super FREQ

You might be interested in PROC PSMATCH, which uses a linear assignment solver under the hood for optimal matching:

https://go.documentation.sas.com/doc/en/pgmsascdc/v_065/statug/statug_psmatch_toc.htm

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Discussion stats
  • 4 replies
  • 444 views
  • 0 likes
  • 4 in conversation