BookmarkSubscribeRSS Feed
lim_6
Fluorite | Level 6

Hi,
I've been working on case-control syntax and tried using the SAS PDF found here: http://www2.sas.com/proceedings/sugi29/173-29.pdf

 

Unfortunately my size is vastly larger (~170000 cases and more than 3 million controls) and this syntax is creating a file size much larger than necessary. I'm hoping to use a non-macro based syntax since my SAS has trouble running macros.

 

I'd like to use some sort of PROC SQL syntax, but I really don't know how to make it more efficient in the beginning and match controls to cases right away without creating a ton of duplicates per each case. Any suggestions or ideas? Anything would be immensely helpful at this point. Thank you!

 

Syntax:

PROC SQL;

CREATE table controls_id

as select

one.ID as

study_id,

two.ID as control_id,

one.age as study_age,

two.age as control_age,

one.race as study_race,

two.race as control_race,

one.rand_num as rand_num

from study one, control two

where (one.age=two.age and

one.race=two.race);

 

* Remove duplicate control subjects;

proc sort data=controls_id nodupkey;

by control_id rand_num;

run;

 

*exactly match on variables with fixed number of controls;

proc sort data=controls_id ;

by study_id rand_num;

run;

data controls_id2 not_enough;

set controls_id;

by study_id ;

retain num;

if first.study_id then num=1;

if num le 2 then do;

output controls_id2;

num=num+1;

end;

if last.study_id then do;

if num le 2 then output not_enough;

end;

run;

proc print data=controls_id2

(obs=40)

;

title2 'matched patients';

run;

 

*use following syntax to remove cases that do not have two controls;

data controls_id3;

merge controls_id2

not_enough(in=b_);

by study_id;

if b_ then delete;

run;

2 REPLIES 2
LinusH
Tourmaline | Level 20
Perhaps you could try select distinct.
It would probably use as much resources as the following proc sort does, but would limit the output from he SQL if I interpret your scenario correctly.
Data never sleeps
Doc_Duke
Rhodochrosite | Level 12

Consider doing propensity score matching.  This paper has some nice optimization techniques

http://www.ucdenver.edu/academics/colleges/PublicHealth/resourcesfor/Faculty/perraillon/code/Pages/p...

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

New Learning Events in April

 

Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.

LEARN MORE

Discussion stats
  • 2 replies
  • 1768 views
  • 1 like
  • 3 in conversation