Hello,
I have a dataset with 87 cases and 1000's of controls. I want to match 2:1 with out replacement for the controls. I want to match on year of procedure (+/- 2 years), diagnosis (which can have 6 different values) and sex. I have searched through many papers online which did not remove duplicate controls.
As well as the above paper which also did not remove duplicates.
I am trying the following code
DATA cases controls;
SET cohort3;
IF fvii_inclusion = 1 THEN OUTPUT cases;
ELSE OUTPUT controls;
RUN;
data control2;
set controls;
time_low=timet-2;
time_high=timet+2;
run;
proc sql;
create table controls_id
as select
one.uniqueid as case_id,
two.uniqueid as control_id,
one.timet as case_time,
two.timet as control_time ,
one.proc as case_proc,
two.proc as control_proc,
one.sex as case_sex,
two.sex as control_sex,
from cases one, control2 two
where ((one.timet between two.time_low and two.time_high) and (one.sex=two.sex) and one.proc=two.proc) ;
I also reviewed the suggestions on hash tags for a similar problem posted but it changed nothing.
HELP much appreciated!
If you want to remove duplicate controls then could you try adding the distinct option in the select statement in your proc sql code?
Then, if you want 2 controls per case, you could try proc surveyselect, something like:
proc surveyselect data=controls_id method=srs sampsize=2;
strata case_id;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.