BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kishiyo
Calcite | Level 5

Hello everyone, 

 

I am working on two different datasets, and would like to borrow your wisdom: 

 

Dataset 1: A group of treatment samples with Variables A and B

Dataset 2: A group of control samples with Variables A and B 

 

There are 47 samples in Dataset 1 (Treatment) and +100 samples in Dataset 2 (Control). 

I want to create 47 pairs - 47 from Dataset1 and 47 from Dataset2- with the closest value of Variables A and B. 

 

First of all, I want to minimize the absolute difference in variable A between treatment sample i (from Dataset1) and control sample i (from Dataset2).

 

Second, likewise in the previous step, I want to minimize the absolute difference in variable B between treatment sample i (from Dataset1) and control sample i (from Dataset2). 

 

Based on the minimum difference in variables A and B, I would like to create a pair of treatment and control samples such that they are very close in terms of the value of variables A and B. 

 

I tried the following code, and the resulting dataset gave 47 pairs, matched in terms of minimum difference in the variables between treatment and control samples. However, there are a lot of duplicates- 

 

For example, 

ID: A from Dataset 1 was paired with ID: 1 from Dataset 2 based on the matching criterion. However, ID: B from Dataset 1 was also paired with ID: 1 from Dataset 2 because the matching variable of ID: B is very close to that of ID: R. 

 

I would like to eliminate these duplicates (overlaps). I would like only one control sample (from Dataset 2) to be matched with one treatment sample (from Dataset 1), but codes that I came up with did not work as I want it to be. 

 

proc sql;

create table dataset as

select a.id, a.variableA as variableA_treatment, a.variableB as variableB_treatment, b.id, b.variableA as variableA_control, b.variableB as variableB_control

from dataset1 as a left join dataset2 as b

on a.variableA > b.variableA or a.variableA < b.variableA or a.variableB > b.variableB or a.variableB < b.variableB

group by a.id

having abs(a.variableA-b.VariableA)=min(abs(a.variableA-b.variableA))
;

quit; 

Any advice and suggestion will be appreciated. 

 

Thank you very much for your time. 

 

Kishio 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

The Mayo Clinic has a bunch of algorithms on their website that illustrates case-control matching algorithms, greedy algorithms and variable matching. And the full SAS code is included. 

 

http://www.mayo.edu/research/departments-divisions/department-health-sciences-research/division-biom...

 

PS. Data management is typically for DataFlux/DI Studio questions so I'll move this post to the Base SAS forum

View solution in original post

2 REPLIES 2
Reeza
Super User

The Mayo Clinic has a bunch of algorithms on their website that illustrates case-control matching algorithms, greedy algorithms and variable matching. And the full SAS code is included. 

 

http://www.mayo.edu/research/departments-divisions/department-health-sciences-research/division-biom...

 

PS. Data management is typically for DataFlux/DI Studio questions so I'll move this post to the Base SAS forum

kishiyo
Calcite | Level 5
Thank you very much! I found a Matching MACRO to help me solve the problem
I had.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1039 views
  • 1 like
  • 2 in conversation