I'd like to do a 1:1 matching where I want to use control once (i.e, 1 case: 1 control).
I found a great article/document showing how to match using proc SQL. The problem is that I want to do 1:1 matching, and I can't figure out how.
** split into two datasets;
data case control;
if Reinfection_Status="yes" then output case;
else if Reinfection_Status="no" then output control;
** Simple match: exact match on age, race, and month of primary infection;
create table controls_id
one.PatientID as case_PatientID,
two.PatientID as control_PatientID,
one.Age as case_Age,
two.Age as control_Age,
one.Ethnicity as case_Ethnicity,
two.Ethnicity as control_Ethnicity,
one.Month_Pr as case_Month_Pr,
two.Month_Pr as control_Month_Pr,
one.rand_num as rand_num
from case one, control two
where (one.Age=two.Age and one.Ethnicity=two.Ethnicity and one.Month_Pr=two.Month_Pr);
* remove duplicate control subjects;
proc sort data=controls_id nodupkey;
by control_PatientID rand_num;
I have 1066 cases and 7088 controls, so I want 1066 cases and 1066 controls to run the analysis. When I ran the proc SQL, I had 1736 observations and when I tried to remove duplicate control subjects using proc sort nodupkey, I had 0 observations with duplicates. How should I go from here..?
I hope I'm making sense. Please let me know if you need anything else from me.
Do a MERGE with no BY:
data case ( keep=patient_id age ethnicity month_pr rename=( patient_id=case_patient_id age=case_age ethnicity=case_ethnicity month_pr=case_month_pr ) ) control ( keep=patient_id age ethnicity month_pr rand_num rename=( /* see above */ ) ) ; set sample_1; rand_num = rand("uniform"); if Reinfection_Status="yes" then output case; else if Reinfection_Status="no" then output control; run; proc sort data=control; by rand_num; run; data controls:id; merge case (in=c) control ; if not c then stop; run;
Untested, posted from my tablet.
Hi @eeun1ilee ,
you better use PROC SQL SET clauses, it is way easier than SQL joins.
It does 1:1 match of obs, eliminates duplicate records, and keeps whatever obs needed, like 'cases' in your example .
proc sql; create table Case_Control_Match as select* /*you need to specify variables list, in case not interested in all variables to be matched in the dataset*/ from cases except corr select* /*you need to specify variables list, in case not interested in all variables to be matched in the dataset*/ from controls order by 1,2,3; /*you need to specify sorting order*/ quit;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.