Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- MACRO / Iterative "DO loop" to create a Matched Sample

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-06-2016 09:14 PM
(1709 views)

Hello fellow SAS users,

I hope someone can provide me with a bit of help. I am using SAS 9.3 and need to do something that is farily simple to understand, and I can certainly code it for a single iteration, but I need to do it many, many times.

Please let me explain, simplifying the data set without sacrificing on the logic.

I have a dataset with "TREATMENT" and "CONTROL" observations for which I am trying to form matched pairs based on a "penalty score," computed using data items that I already have.

I have created a data set that looks something like this, for simplicity:

NAME_T ID_T VAR1_T VAR2_T VAR3_T NAME_C ID_C VAR1_C VAR2_C VAR3_C

NAME and ID are just identifying information;

VAR1, VAR2, VAR3 = the data on which I will create the penalty score;

_T = populated for treatment observations;

_C = populated for control observations;

(There is also a somewhat redundant variable, TREAT, which is =0,1 for control/treatment observations)

Suppose that I have **1,000 TREATMENT** participants, and 10,000 __possible __CONTROL participants included in the DATA file named "DATA_ALL"

Below, I have colored the steps in blue for which I am unable to execute a code, with notes. For other steps, I have annotated briefly.

Essentially, I need to do the following:

1a. Create a dataset with only 1 TREATMENT participant; call this SINGLE_TREATMENT

(How do I reference ID=1 through 1,000 in single iterations?)

1b. Create a dataset with ALL control participants

DATA CONTROL;

SET DATA_ALL;

IF TREAT=1 THEN DELETE;

RUN;

2. Merge the two using ("if _n_=1 then set SINGLE_TREATMENT") such that the **single treatment participant** is "matched" with each available control participant

DATA CONTROL_MATCHED;

IF _N_=1 THEN SET SINGLE_TREATMENT;

RUN;

3. Compute the "penalty score," for simplicity SCORE = (VAR1_T + VAR2_T + VAR3_T) - (VAR1_C + VAR2_C + VAR3_C), for each pair of participants

DATA CONTROL_MATCHED;

IF _N_=1 THEN SET SINGLE_TREATMENT;

SCORE = (VAR1_T + VAR2_T + VAR3_T) - (VAR1_C + VAR2_C + VAR3_C)

RUN;

4. Sort the dataset by ascending "penalty score" so that the lowest value of SCORE is at the top

PROC SORT DATA=CONTROL_MATCHED ASCENDING;

BY SCORE;

RUN;

*****Steps 1 through 4 I can easily code, it is the repitition in Steps 5 through 8 that is difficult for me*****

** **

5. Delete all except the first observation, which will keep the pair with the lowest penalty score

6. Create a new DATA file for this matched pair

7. **Repeat** this for each TREATMENT participant, 1,000 times

8. Merge all data sets created in Step 6, such that the final output has **1,000 observations** (the number of TREATMENT participants), with each observation having identifying information for the best matched CONTROL observation

ANY HELP IS GREATLY APPRECIATED! THANK YOU IN ADVANCE!!!

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

First, create datasets for treatment and control participants:

```
data
treatment (keep=name_t id_t var1_t var2_t var3_t)
control (keep=name_c id_c var1_c var2_c var3_c)
;
set data_all;
if treat = 1
then output treatment;
else output control;
run;
```

Then build a cartesian product (10 million obs should be no problem)

```
proc sql;
create table inter as
select
*,
((var1_t + var2_t + var3_t) - (var1_c + var2_c + var3_c)) as score
from
treatment, control
;
quit;
```

Now sort and select

```
proc sort data=inter;
by name_t score;
run;
data want;
set inter;
by name_t;
if first.name_t;
run;
```

One can easily integrate the sort and select into the SQL, but I wanted to keep the steps simple

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks so much! I think this will work.

Follow-up question: Suppose I have a categorical variable that acts as an additional criteria besides the score. Say, gender.

So, I only want the matched pairs to be both male, or both female.

At what point in your proposed code should I eliminate the combinations where GENDER_T ne GENDER_B ?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In the SQL, you can add a where condition:

`where a.gender_t = gender_t`

You must of course include the gender in the keeps of the first data step.

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Ready to level-up your skills? Choose your own adventure.