BookmarkSubscribeRSS Feed
lim_6
Fluorite | Level 6

Hi,
I've been working on case-control syntax and tried using the SAS PDF found here: http://www2.sas.com/proceedings/sugi29/173-29.pdf

 

Unfortunately my size is vastly larger (~170000 cases and more than 3 million controls) and this syntax is creating a file size much larger than necessary. I'm hoping to use a non-macro based syntax since my SAS has trouble running macros.

 

I'd like to use some sort of PROC SQL syntax, but I really don't know how to make it more efficient in the beginning and match controls to cases right away without creating a ton of duplicates per each case. Any suggestions or ideas? Anything would be immensely helpful at this point. Thank you!

 

Syntax:

PROC SQL;

CREATE table controls_id

as select

one.ID as

study_id,

two.ID as control_id,

one.age as study_age,

two.age as control_age,

one.race as study_race,

two.race as control_race,

one.rand_num as rand_num

from study one, control two

where (one.age=two.age and

one.race=two.race);

 

* Remove duplicate control subjects;

proc sort data=controls_id nodupkey;

by control_id rand_num;

run;

 

*exactly match on variables with fixed number of controls;

proc sort data=controls_id ;

by study_id rand_num;

run;

data controls_id2 not_enough;

set controls_id;

by study_id ;

retain num;

if first.study_id then num=1;

if num le 2 then do;

output controls_id2;

num=num+1;

end;

if last.study_id then do;

if num le 2 then output not_enough;

end;

run;

proc print data=controls_id2

(obs=40)

;

title2 'matched patients';

run;

 

*use following syntax to remove cases that do not have two controls;

data controls_id3;

merge controls_id2

not_enough(in=b_);

by study_id;

if b_ then delete;

run;

2 REPLIES 2
LinusH
Tourmaline | Level 20
Perhaps you could try select distinct.
It would probably use as much resources as the following proc sort does, but would limit the output from he SQL if I interpret your scenario correctly.
Data never sleeps
Doc_Duke
Rhodochrosite | Level 12

Consider doing propensity score matching.  This paper has some nice optimization techniques

http://www.ucdenver.edu/academics/colleges/PublicHealth/resourcesfor/Faculty/perraillon/code/Pages/p...

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Health and Life Sciences Learning

 

Need courses to help you with SAS Life Sciences Analytics Framework, SAS Health Cohort Builder, or other topics? Check out the Health and Life Sciences learning path for all of the offerings.

LEARN MORE

Discussion stats
  • 2 replies
  • 1987 views
  • 1 like
  • 3 in conversation