Hello all,
I'd like to do a 1:1 matching where I want to use control once (i.e, 1 case: 1 control).
I found a great article/document showing how to match using proc SQL. The problem is that I want to do 1:1 matching, and I can't figure out how.
My codes:
** split into two datasets;
data case control;
set sample_1;
rand_num=uniform(0);
if Reinfection_Status="yes" then output case;
else if Reinfection_Status="no" then output control;
run;
** Simple match: exact match on age, race, and month of primary infection;
proc sql;
create table controls_id
as select
one.PatientID as case_PatientID,
two.PatientID as control_PatientID,
one.Age as case_Age,
two.Age as control_Age,
one.Ethnicity as case_Ethnicity,
two.Ethnicity as control_Ethnicity,
one.Month_Pr as case_Month_Pr,
two.Month_Pr as control_Month_Pr,
one.rand_num as rand_num
from case one, control two
where (one.Age=two.Age and one.Ethnicity=two.Ethnicity and one.Month_Pr=two.Month_Pr);
run;
* remove duplicate control subjects;
proc sort data=controls_id nodupkey;
by control_PatientID rand_num;
run;
I have 1066 cases and 7088 controls, so I want 1066 cases and 1066 controls to run the analysis. When I ran the proc SQL, I had 1736 observations and when I tried to remove duplicate control subjects using proc sort nodupkey, I had 0 observations with duplicates. How should I go from here..?
I hope I'm making sense. Please let me know if you need anything else from me.
Thank you.
I think that you want is to sort by case_PatientID and not control_PatientID.
Do a MERGE with no BY:
data
case (
keep=patient_id age ethnicity month_pr
rename=(
patient_id=case_patient_id
age=case_age
ethnicity=case_ethnicity
month_pr=case_month_pr
)
)
control (
keep=patient_id age ethnicity month_pr rand_num
rename=(
/* see above */
)
)
;
set sample_1;
rand_num = rand("uniform");
if Reinfection_Status="yes"
then output case;
else if Reinfection_Status="no" then output control;
run;
proc sort data=control;
by rand_num;
run;
data controls:id;
merge
case (in=c)
control
;
if not c then stop;
run;
Untested, posted from my tablet.
Hi @eeun1ilee ,
you better use PROC SQL SET clauses, it is way easier than SQL joins.
It does 1:1 match of obs, eliminates duplicate records, and keeps whatever obs needed, like 'cases' in your example .
proc sql;
create table Case_Control_Match as
select* /*you need to specify variables list, in case not interested in all variables to be matched in the dataset*/
from cases
except corr
select* /*you need to specify variables list, in case not interested in all variables to be matched in the dataset*/
from controls
order by 1,2,3; /*you need to specify sorting order*/
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.