BookmarkSubscribeRSS Feed
eeun1ilee
Fluorite | Level 6

Hello all,

 

I'd like to do a 1:1 matching where I want to use control once (i.e, 1 case: 1 control).

I found a great article/document showing how to match using proc SQL. The problem is that I want to do 1:1 matching, and I can't figure out how.

 

My codes:


** split into two datasets;

 

data case control;
set sample_1;
rand_num=uniform(0);
if Reinfection_Status="yes" then output case;
else if Reinfection_Status="no" then output control;
run;

 

** Simple match: exact match on age, race, and month of primary infection;

proc sql;
create table controls_id
as select
one.PatientID as case_PatientID,
two.PatientID as control_PatientID,
one.Age as case_Age,
two.Age as control_Age,
one.Ethnicity as case_Ethnicity,
two.Ethnicity as control_Ethnicity,
one.Month_Pr as case_Month_Pr,
two.Month_Pr as control_Month_Pr,
one.rand_num as rand_num
from case one, control two
where (one.Age=two.Age and one.Ethnicity=two.Ethnicity and one.Month_Pr=two.Month_Pr);
run;

* remove duplicate control subjects;

proc sort data=controls_id nodupkey;
by control_PatientID rand_num;
run;

 

I have 1066 cases and 7088 controls, so I want 1066 cases and 1066 controls to run the analysis. When I ran the proc SQL, I had 1736 observations and when I tried to remove duplicate control subjects using proc sort nodupkey, I had 0 observations with duplicates. How should I go from here..? 

I hope I'm making sense. Please let me know if you need anything else from me. 

Thank you.

3 REPLIES 3
s_lassen
Meteorite | Level 14

I think that you want is to sort by case_PatientID and not control_PatientID.

Kurt_Bremser
Super User

Do a MERGE with no BY:

data
  case (
    keep=patient_id age ethnicity month_pr
    rename=(
      patient_id=case_patient_id
      age=case_age
      ethnicity=case_ethnicity
      month_pr=case_month_pr
    )
  )
  control (
    keep=patient_id age ethnicity month_pr rand_num
    rename=(
      /* see above */
    )
  )
;
set sample_1;
rand_num = rand("uniform");
if Reinfection_Status="yes"
then output case;
else if Reinfection_Status="no" then output control;
run;

proc sort data=control;
by rand_num;
run;

data controls:id;
merge
  case (in=c)
  control
;
if not c then stop;
run;

Untested, posted from my tablet.

 

A_Kh
Lapis Lazuli | Level 10

Hi @eeun1ilee , 

 

you better use PROC SQL SET clauses, it is way easier than SQL joins. 
It does 1:1 match of obs, eliminates duplicate records, and keeps whatever obs needed, like 'cases' in your example .

proc sql;
	create table Case_Control_Match as
		select* /*you need to specify variables list, in case not interested in all variables to be matched in the dataset*/
			from cases
				except corr
			select* /*you need to specify variables list, in case not interested in all variables to be matched in the dataset*/
		from controls
	order by 1,2,3; /*you need to specify sorting order*/
quit; 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 714 views
  • 0 likes
  • 4 in conversation