Matching cases and controls using PROC SQL

Reply
Occasional Contributor
Posts: 5

Matching cases and controls using PROC SQL

Hi,
I've been working on case-control syntax and tried using the SAS PDF found here: http://www2.sas.com/proceedings/sugi29/173-29.pdf

 

Unfortunately my size is vastly larger (~170000 cases and more than 3 million controls) and this syntax is creating a file size much larger than necessary. I'm hoping to use a non-macro based syntax since my SAS has trouble running macros.

 

I'd like to use some sort of PROC SQL syntax, but I really don't know how to make it more efficient in the beginning and match controls to cases right away without creating a ton of duplicates per each case. Any suggestions or ideas? Anything would be immensely helpful at this point. Thank you!

 

Syntax:

PROC SQL;

CREATE table controls_id

as select

one.ID as

study_id,

two.ID as control_id,

one.age as study_age,

two.age as control_age,

one.race as study_race,

two.race as control_race,

one.rand_num as rand_num

from study one, control two

where (one.age=two.age and

one.race=two.race);

 

* Remove duplicate control subjects;

proc sort data=controls_id nodupkey;

by control_id rand_num;

run;

 

*exactly match on variables with fixed number of controls;

proc sort data=controls_id ;

by study_id rand_num;

run;

data controls_id2 not_enough;

set controls_id;

by study_id ;

retain num;

if first.study_id then num=1;

if num le 2 then do;

output controls_id2;

num=num+1;

end;

if last.study_id then do;

if num le 2 then output not_enough;

end;

run;

proc print data=controls_id2

(obs=40)

;

title2 'matched patients';

run;

 

*use following syntax to remove cases that do not have two controls;

data controls_id3;

merge controls_id2

not_enough(in=b_);

by study_id;

if b_ then delete;

run;

Super User
Posts: 5,257

Re: Matching cases and controls using PROC SQL

Perhaps you could try select distinct.
It would probably use as much resources as the following proc sort does, but would limit the output from he SQL if I interpret your scenario correctly.
Data never sleeps
Trusted Advisor
Posts: 2,113

Re: Matching cases and controls using PROC SQL

Consider doing propensity score matching.  This paper has some nice optimization techniques

http://www.ucdenver.edu/academics/colleges/PublicHealth/resourcesfor/Faculty/perraillon/code/Pages/p...

 

Ask a Question
Discussion stats
  • 2 replies
  • 435 views
  • 1 like
  • 3 in conversation