Matching cases and controls using PROC SQL

lim_6 · Posted 04-05-2016 05:29 PM

Hi,
I've been working on case-control syntax and tried using the SAS PDF found here: http://www2.sas.com/proceedings/sugi29/173-29.pdf

Unfortunately my size is vastly larger (~170000 cases and more than 3 million controls) and this syntax is creating a file size much larger than necessary. I'm hoping to use a non-macro based syntax since my SAS has trouble running macros.

I'd like to use some sort of PROC SQL syntax, but I really don't know how to make it more efficient in the beginning and match controls to cases right away without creating a ton of duplicates per each case. Any suggestions or ideas? Anything would be immensely helpful at this point. Thank you!

Syntax:

PROC SQL;

CREATE table controls_id

as select

one.ID as

study_id,

two.ID as control_id,

one.age as study_age,

two.age as control_age,

one.race as study_race,

two.race as control_race,

one.rand_num as rand_num

from study one, control two

where (one.age=two.age and

one.race=two.race);

* Remove duplicate control subjects;

proc sort data=controls_id nodupkey;

by control_id rand_num;

run;

*exactly match on variables with fixed number of controls;

proc sort data=controls_id ;

by study_id rand_num;

run;

data controls_id2 not_enough;

set controls_id;

by study_id ;

retain num;

if first.study_id then num=1;

if num le 2 then do;

output controls_id2;

num=num+1;

end;

if last.study_id then do;

if num le 2 then output not_enough;

end;

run;

proc print data=controls_id2

(obs=40)

;

title2 'matched patients';

run;

*use following syntax to remove cases that do not have two controls;

data controls_id3;

merge controls_id2

not_enough(in=b_);

by study_id;

if b_ then delete;

run;

LinusH · Posted 04-07-2016 12:30 AM

Perhaps you could try select distinct.
It would probably use as much resources as the following proc sort does, but would limit the output from he SQL if I interpret your scenario correctly.

Data never sleeps

Doc_Duke · Posted 04-14-2016 09:56 AM

Consider doing propensity score matching. This paper has some nice optimization techniques

http://www.ucdenver.edu/academics/colleges/PublicHealth/resourcesfor/Faculty/perraillon/code/Pages/p...

Matching cases and controls using PROC SQL

Re: Matching cases and controls using PROC SQL

Re: Matching cases and controls using PROC SQL

Matching cases and controls using PROC SQL

Re: Matching cases and controls using PROC SQL

Re: Matching cases and controls using PROC SQL

SAS Innovate 2025: Save the Date

Health and Life Sciences Learning