BookmarkSubscribeRSS Feed
lim_6
Fluorite | Level 6

Hi,
I've been working on case-control syntax and tried using the SAS PDF found here: http://www2.sas.com/proceedings/sugi29/173-29.pdf

 

Unfortunately my size is vastly larger (~170000 cases and more than 3 million controls) and this syntax is creating a file size much larger than necessary. I'm hoping to use a non-macro based syntax since my SAS has trouble running macros.

 

I'd like to use some sort of PROC SQL syntax, but I really don't know how to make it more efficient in the beginning and match controls to cases right away without creating a ton of duplicates per each case. Any suggestions or ideas? Anything would be immensely helpful at this point. Thank you!

 

Syntax:

PROC SQL;

CREATE table controls_id

as select

one.ID as

study_id,

two.ID as control_id,

one.age as study_age,

two.age as control_age,

one.race as study_race,

two.race as control_race,

one.rand_num as rand_num

from study one, control two

where (one.age=two.age and

one.race=two.race);

 

* Remove duplicate control subjects;

proc sort data=controls_id nodupkey;

by control_id rand_num;

run;

 

*exactly match on variables with fixed number of controls;

proc sort data=controls_id ;

by study_id rand_num;

run;

data controls_id2 not_enough;

set controls_id;

by study_id ;

retain num;

if first.study_id then num=1;

if num le 2 then do;

output controls_id2;

num=num+1;

end;

if last.study_id then do;

if num le 2 then output not_enough;

end;

run;

proc print data=controls_id2

(obs=40)

;

title2 'matched patients';

run;

 

*use following syntax to remove cases that do not have two controls;

data controls_id3;

merge controls_id2

not_enough(in=b_);

by study_id;

if b_ then delete;

run;

2 REPLIES 2
LinusH
Tourmaline | Level 20
Perhaps you could try select distinct.
It would probably use as much resources as the following proc sort does, but would limit the output from he SQL if I interpret your scenario correctly.
Data never sleeps
Doc_Duke
Rhodochrosite | Level 12

Consider doing propensity score matching.  This paper has some nice optimization techniques

http://www.ucdenver.edu/academics/colleges/PublicHealth/resourcesfor/Faculty/perraillon/code/Pages/p...

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

New Learning Events in April

 

Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.

LEARN MORE

Discussion stats
  • 2 replies
  • 1738 views
  • 1 like
  • 3 in conversation