turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Using Proc SQL to create a pair of two samples fro...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-29-2017 05:57 PM

Hello everyone,

I am working on two different datasets, and would like to borrow your wisdom:

Dataset 1: A group of treatment samples with Variables A and B

Dataset 2: A group of control samples with Variables A and B

There are 47 samples in Dataset 1 (Treatment) and +100 samples in Dataset 2 (Control).

I want to create 47 pairs - 47 from Dataset1 and 47 from Dataset2- with the closest value of Variables A and B.

First of all, I want to minimize the absolute difference in variable A between treatment sample i (from Dataset1) and control sample i (from Dataset2).

Second, likewise in the previous step, I want to minimize the absolute difference in variable B between treatment sample i (from Dataset1) and control sample i (from Dataset2).

Based on the minimum difference in variables A and B, I would like to create a pair of treatment and control samples such that they are very close in terms of the value of variables A and B.

I tried the following code, and the resulting dataset gave 47 pairs, matched in terms of minimum difference in the variables between treatment and control samples. However, there are a lot of duplicates-

For example,

**ID: A** from Dataset 1 was paired with **ID: 1** from Dataset 2 based on the matching criterion. However, **ID: B** from Dataset 1 was also paired with **ID: 1** from Dataset 2 because the matching variable of ID: B is very close to that of ID: R.

I would like to eliminate these duplicates (overlaps). I would like only one control sample (from Dataset 2) to be matched with one treatment sample (from Dataset 1), but codes that I came up with did not work as I want it to be.

```
proc sql;
create table dataset as
select a.id, a.variableA as variableA_treatment, a.variableB as variableB_treatment, b.id, b.variableA as variableA_control, b.variableB as variableB_control
from dataset1 as a left join dataset2 as b
on a.variableA > b.variableA or a.variableA < b.variableA or a.variableB > b.variableB or a.variableB < b.variableB
group by a.id
having abs(a.variableA-b.VariableA)=min(abs(a.variableA-b.variableA))
;
quit;
```

Any advice and suggestion will be appreciated.

Thank you very much for your time.

Kishio

Accepted Solutions

Solution

11-02-2017
04:33 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kishiyo

10-29-2017 07:09 PM

The Mayo Clinic has a bunch of algorithms on their website that illustrates case-control matching algorithms, greedy algorithms and variable matching. And the full SAS code is included.

PS. Data management is typically for DataFlux/DI Studio questions so I'll move this post to the Base SAS forum

All Replies

Solution

11-02-2017
04:33 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to kishiyo

10-29-2017 07:09 PM

The Mayo Clinic has a bunch of algorithms on their website that illustrates case-control matching algorithms, greedy algorithms and variable matching. And the full SAS code is included.

PS. Data management is typically for DataFlux/DI Studio questions so I'll move this post to the Base SAS forum

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

11-02-2017 04:34 PM

Thank you very much! I found a Matching MACRO to help me solve the problem

I had.

I had.