Hello all,
I'd like to do a 1:1 matching where I want to use control once (i.e, 1 case: 1 control).
I found a great article/document showing how to match using proc SQL. The problem is that I want to do 1:1 matching, and I can't figure out how.
My codes:
** split into two datasets;
data case control;
set sample_1;
rand_num=uniform(0);
if Reinfection_Status="yes" then output case;
else if Reinfection_Status="no" then output control;
run;
** Simple match: exact match on age, race, and month of primary infection;
proc sql;
create table controls_id
as select
one.PatientID as case_PatientID,
two.PatientID as control_PatientID,
one.Age as case_Age,
two.Age as control_Age,
one.Ethnicity as case_Ethnicity,
two.Ethnicity as control_Ethnicity,
one.Month_Pr as case_Month_Pr,
two.Month_Pr as control_Month_Pr,
one.rand_num as rand_num
from case one, control two
where (one.Age=two.Age and one.Ethnicity=two.Ethnicity and one.Month_Pr=two.Month_Pr);
run;
* remove duplicate control subjects;
proc sort data=controls_id nodupkey;
by control_PatientID rand_num;
run;
I have 1066 cases and 7088 controls, so I want 1066 cases and 1066 controls to run the analysis. When I ran the proc SQL, I had 1736 observations and when I tried to remove duplicate control subjects using proc sort nodupkey, I had 0 observations with duplicates. How should I go from here..?
I hope I'm making sense. Please let me know if you need anything else from me.
Thank you.
I think that you want is to sort by case_PatientID and not control_PatientID.
Do a MERGE with no BY:
data
case (
keep=patient_id age ethnicity month_pr
rename=(
patient_id=case_patient_id
age=case_age
ethnicity=case_ethnicity
month_pr=case_month_pr
)
)
control (
keep=patient_id age ethnicity month_pr rand_num
rename=(
/* see above */
)
)
;
set sample_1;
rand_num = rand("uniform");
if Reinfection_Status="yes"
then output case;
else if Reinfection_Status="no" then output control;
run;
proc sort data=control;
by rand_num;
run;
data controls:id;
merge
case (in=c)
control
;
if not c then stop;
run;
Untested, posted from my tablet.
Hi @eeun1ilee ,
you better use PROC SQL SET clauses, it is way easier than SQL joins.
It does 1:1 match of obs, eliminates duplicate records, and keeps whatever obs needed, like 'cases' in your example .
proc sql;
create table Case_Control_Match as
select* /*you need to specify variables list, in case not interested in all variables to be matched in the dataset*/
from cases
except corr
select* /*you need to specify variables list, in case not interested in all variables to be matched in the dataset*/
from controls
order by 1,2,3; /*you need to specify sorting order*/
quit;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.