Hello everyone,
I am working on two different datasets, and would like to borrow your wisdom:
Dataset 1: A group of treatment samples with Variables A and B
Dataset 2: A group of control samples with Variables A and B
There are 47 samples in Dataset 1 (Treatment) and +100 samples in Dataset 2 (Control).
I want to create 47 pairs - 47 from Dataset1 and 47 from Dataset2- with the closest value of Variables A and B.
First of all, I want to minimize the absolute difference in variable A between treatment sample i (from Dataset1) and control sample i (from Dataset2).
Second, likewise in the previous step, I want to minimize the absolute difference in variable B between treatment sample i (from Dataset1) and control sample i (from Dataset2).
Based on the minimum difference in variables A and B, I would like to create a pair of treatment and control samples such that they are very close in terms of the value of variables A and B.
I tried the following code, and the resulting dataset gave 47 pairs, matched in terms of minimum difference in the variables between treatment and control samples. However, there are a lot of duplicates-
For example,
ID: A from Dataset 1 was paired with ID: 1 from Dataset 2 based on the matching criterion. However, ID: B from Dataset 1 was also paired with ID: 1 from Dataset 2 because the matching variable of ID: B is very close to that of ID: R.
I would like to eliminate these duplicates (overlaps). I would like only one control sample (from Dataset 2) to be matched with one treatment sample (from Dataset 1), but codes that I came up with did not work as I want it to be.
proc sql;
create table dataset as
select a.id, a.variableA as variableA_treatment, a.variableB as variableB_treatment, b.id, b.variableA as variableA_control, b.variableB as variableB_control
from dataset1 as a left join dataset2 as b
on a.variableA > b.variableA or a.variableA < b.variableA or a.variableB > b.variableB or a.variableB < b.variableB
group by a.id
having abs(a.variableA-b.VariableA)=min(abs(a.variableA-b.variableA))
;
quit;
Any advice and suggestion will be appreciated.
Thank you very much for your time.
Kishio
The Mayo Clinic has a bunch of algorithms on their website that illustrates case-control matching algorithms, greedy algorithms and variable matching. And the full SAS code is included.
PS. Data management is typically for DataFlux/DI Studio questions so I'll move this post to the Base SAS forum
The Mayo Clinic has a bunch of algorithms on their website that illustrates case-control matching algorithms, greedy algorithms and variable matching. And the full SAS code is included.
PS. Data management is typically for DataFlux/DI Studio questions so I'll move this post to the Base SAS forum
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.