BookmarkSubscribeRSS Feed
eeun1ilee
Fluorite | Level 6

Hello all,

 

I'd like to do a 1:1 matching where I want to use control once (i.e, 1 case: 1 control).

I found a great article/document showing how to match using proc SQL. The problem is that I want to do 1:1 matching, and I can't figure out how.

 

My codes:


** split into two datasets;

 

data case control;
set sample_1;
rand_num=uniform(0);
if Reinfection_Status="yes" then output case;
else if Reinfection_Status="no" then output control;
run;

 

** Simple match: exact match on age, race, and month of primary infection;

proc sql;
create table controls_id
as select
one.PatientID as case_PatientID,
two.PatientID as control_PatientID,
one.Age as case_Age,
two.Age as control_Age,
one.Ethnicity as case_Ethnicity,
two.Ethnicity as control_Ethnicity,
one.Month_Pr as case_Month_Pr,
two.Month_Pr as control_Month_Pr,
one.rand_num as rand_num
from case one, control two
where (one.Age=two.Age and one.Ethnicity=two.Ethnicity and one.Month_Pr=two.Month_Pr);
run;

* remove duplicate control subjects;

proc sort data=controls_id nodupkey;
by control_PatientID rand_num;
run;

 

I have 1066 cases and 7088 controls, so I want 1066 cases and 1066 controls to run the analysis. When I ran the proc SQL, I had 1736 observations and when I tried to remove duplicate control subjects using proc sort nodupkey, I had 0 observations with duplicates. How should I go from here..? 

I hope I'm making sense. Please let me know if you need anything else from me. 

Thank you.

3 REPLIES 3
s_lassen
Meteorite | Level 14

I think that you want is to sort by case_PatientID and not control_PatientID.

Kurt_Bremser
Super User

Do a MERGE with no BY:

data
  case (
    keep=patient_id age ethnicity month_pr
    rename=(
      patient_id=case_patient_id
      age=case_age
      ethnicity=case_ethnicity
      month_pr=case_month_pr
    )
  )
  control (
    keep=patient_id age ethnicity month_pr rand_num
    rename=(
      /* see above */
    )
  )
;
set sample_1;
rand_num = rand("uniform");
if Reinfection_Status="yes"
then output case;
else if Reinfection_Status="no" then output control;
run;

proc sort data=control;
by rand_num;
run;

data controls:id;
merge
  case (in=c)
  control
;
if not c then stop;
run;

Untested, posted from my tablet.

 

A_Kh
Barite | Level 11

Hi @eeun1ilee , 

 

you better use PROC SQL SET clauses, it is way easier than SQL joins. 
It does 1:1 match of obs, eliminates duplicate records, and keeps whatever obs needed, like 'cases' in your example .

proc sql;
	create table Case_Control_Match as
		select* /*you need to specify variables list, in case not interested in all variables to be matched in the dataset*/
			from cases
				except corr
			select* /*you need to specify variables list, in case not interested in all variables to be matched in the dataset*/
		from controls
	order by 1,2,3; /*you need to specify sorting order*/
quit; 
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2389 views
  • 0 likes
  • 4 in conversation