MACRO / Iterative "DO loop" to create a Matched Sample

wriccar · Posted 06-06-2016 09:14 PM

Hello fellow SAS users,

I hope someone can provide me with a bit of help. I am using SAS 9.3 and need to do something that is farily simple to understand, and I can certainly code it for a single iteration, but I need to do it many, many times.

Please let me explain, simplifying the data set without sacrificing on the logic.

I have a dataset with "TREATMENT" and "CONTROL" observations for which I am trying to form matched pairs based on a "penalty score," computed using data items that I already have.

I have created a data set that looks something like this, for simplicity:

NAME_T ID_T VAR1_T VAR2_T VAR3_T NAME_C ID_C VAR1_C VAR2_C VAR3_C

NAME and ID are just identifying information;

VAR1, VAR2, VAR3 = the data on which I will create the penalty score;

_T = populated for treatment observations;

_C = populated for control observations;

(There is also a somewhat redundant variable, TREAT, which is =0,1 for control/treatment observations)

Suppose that I have 1,000 TREATMENT participants, and 10,000 possible CONTROL participants included in the DATA file named "DATA_ALL"

Below, I have colored the steps in blue for which I am unable to execute a code, with notes. For other steps, I have annotated briefly.

Essentially, I need to do the following:

1a. Create a dataset with only 1 TREATMENT participant; call this SINGLE_TREATMENT

(How do I reference ID=1 through 1,000 in single iterations?)

1b. Create a dataset with ALL control participants

DATA CONTROL;

SET DATA_ALL;

IF TREAT=1 THEN DELETE;

RUN;

2. Merge the two using ("if _n_=1 then set SINGLE_TREATMENT") such that the single treatment participant is "matched" with each available control participant

DATA CONTROL_MATCHED;

IF _N_=1 THEN SET SINGLE_TREATMENT;

RUN;

3. Compute the "penalty score," for simplicity SCORE = (VAR1_T + VAR2_T + VAR3_T) - (VAR1_C + VAR2_C + VAR3_C), for each pair of participants

DATA CONTROL_MATCHED;

IF _N_=1 THEN SET SINGLE_TREATMENT;

SCORE = (VAR1_T + VAR2_T + VAR3_T) - (VAR1_C + VAR2_C + VAR3_C)

RUN;

4. Sort the dataset by ascending "penalty score" so that the lowest value of SCORE is at the top

PROC SORT DATA=CONTROL_MATCHED ASCENDING;

BY SCORE;

RUN;

***Steps 1 through 4 I can easily code, it is the repitition in Steps 5 through 8 that is difficult for me***

5. Delete all except the first observation, which will keep the pair with the lowest penalty score

6. Create a new DATA file for this matched pair

7. Repeat this for each TREATMENT participant, 1,000 times

8. Merge all data sets created in Step 6, such that the final output has 1,000 observations (the number of TREATMENT participants), with each observation having identifying information for the best matched CONTROL observation

ANY HELP IS GREATLY APPRECIATED! THANK YOU IN ADVANCE!!!

Kurt_Bremser · Posted 06-07-2016 02:23 AM

First, create datasets for treatment and control participants:

data
  treatment (keep=name_t id_t var1_t var2_t var3_t)
  control (keep=name_c id_c var1_c var2_c var3_c)
;
set data_all;
if treat = 1
then output treatment;
else output control;
run;

Then build a cartesian product (10 million obs should be no problem)

proc sql;
create table inter as
select
  *,
  ((var1_t + var2_t + var3_t) - (var1_c + var2_c + var3_c)) as score
from
  treatment, control
;
quit;

Now sort and select

proc sort data=inter;
by name_t score;
run;

data want;
set inter;
by name_t;
if first.name_t;
run;

One can easily integrate the sort and select into the SQL, but I wanted to keep the steps simple

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

wriccar · Posted 06-07-2016 11:18 AM

Thanks so much! I think this will work.

Follow-up question: Suppose I have a categorical variable that acts as an additional criteria besides the score. Say, gender.

So, I only want the matched pairs to be both male, or both female.

At what point in your proposed code should I eliminate the combinations where GENDER_T ne GENDER_B ?

Kurt_Bremser · Posted 06-07-2016 01:31 PM

In the SQL, you can add a where condition:

where a.gender_t = gender_t

You must of course include the gender in the keeps of the first data step.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

MACRO / Iterative "DO loop" to create a Matched Sample

Re: HELP NEEDED ASAP: MACRO / Iterative "DO loop" to create a Matched Sample

Re: HELP NEEDED ASAP: MACRO / Iterative "DO loop" to create a Matched Sample

Re: HELP NEEDED ASAP: MACRO / Iterative "DO loop" to create a Matched Sample

Catch up on SAS Innovate 2026

MACRO / Iterative "DO loop" to create a Matched Sample

Re: HELP NEEDED ASAP: MACRO / Iterative "DO loop" to create a Matched Sample

Re: HELP NEEDED ASAP: MACRO / Iterative "DO loop" to create a Matched Sample

Re: HELP NEEDED ASAP: MACRO / Iterative "DO loop" to create a Matched Sample

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away