Help using Base SAS procedures

MACRO / Iterative "DO loop" to create a Matched Sample

Reply
Occasional Contributor
Posts: 19

MACRO / Iterative "DO loop" to create a Matched Sample

[ Edited ]

Hello fellow SAS users, 

 

I hope someone can provide me with a bit of help. I am using SAS 9.3 and need to do something that is farily simple to understand, and I can certainly code it for a single iteration, but I need to do it many, many times.

 

Please let me explain, simplifying the data set without sacrificing on the logic. 

 

I have a dataset with "TREATMENT" and "CONTROL" observations for which I am trying to form matched pairs based on a "penalty score," computed using data items that I already have. 

 

I have created a data set that looks something like this, for simplicity:

NAME_T     ID_T     VAR1_T     VAR2_T     VAR3_T     NAME_C     ID_C     VAR1_C     VAR2_C     VAR3_C     

 

NAME and ID are just identifying information;

VAR1, VAR2, VAR3 = the data on which I will create the penalty score; 

_T = populated for treatment observations;

_C = populated for control observations;

(There is also a somewhat redundant variable, TREAT, which is =0,1 for control/treatment observations)

 

Suppose that I have 1,000 TREATMENT participants, and 10,000 possible CONTROL participants included in the DATA file named "DATA_ALL" 

 

Below, I have colored the steps in blue for which I am unable to execute a code, with notes. For other steps, I have annotated briefly. 

Essentially, I need to do the following: 

1a. Create a dataset with only 1 TREATMENT participant; call this SINGLE_TREATMENT

(How do I reference ID=1 through 1,000 in single iterations?)

 

1b. Create a dataset with ALL control participants

DATA CONTROL;  

SET DATA_ALL;

IF TREAT=1 THEN DELETE;

RUN;

 

2. Merge the two using ("if _n_=1 then set SINGLE_TREATMENT") such that the single treatment participant is "matched" with each available control participant

DATA CONTROL_MATCHED;

IF _N_=1 THEN SET SINGLE_TREATMENT;

RUN;

 

3. Compute the "penalty score," for simplicity SCORE = (VAR1_T + VAR2_T + VAR3_T) - (VAR1_C + VAR2_C + VAR3_C), for each pair of participants

DATA CONTROL_MATCHED;

IF _N_=1 THEN SET SINGLE_TREATMENT;

SCORE = (VAR1_T + VAR2_T + VAR3_T) - (VAR1_C + VAR2_C + VAR3_C)

RUN;

 

4. Sort the dataset by ascending "penalty score" so that the lowest value of SCORE is at the top

PROC SORT DATA=CONTROL_MATCHED ASCENDING; 

BY SCORE; 

RUN;

 

***Steps 1 through 4 I can easily code, it is the repitition in Steps 5 through 8 that is difficult for me***

 

5. Delete all except the first observation, which will keep the pair with the lowest penalty score

 

6. Create a new DATA file for this matched pair

 

7. Repeat this for each TREATMENT participant, 1,000 times

 

8. Merge all data sets created in Step 6, such that the final output has 1,000 observations (the number of TREATMENT participants), with each observation having identifying information for the best matched CONTROL observation

 

ANY HELP IS GREATLY APPRECIATED! THANK YOU IN ADVANCE!!!

Super User
Posts: 6,972

Re: HELP NEEDED ASAP: MACRO / Iterative "DO loop" to create a Matched Sample

First, create datasets for treatment and control participants:

data
  treatment (keep=name_t id_t var1_t var2_t var3_t)
  control (keep=name_c id_c var1_c var2_c var3_c)
;
set data_all;
if treat = 1
then output treatment;
else output control;
run;

Then build a cartesian product (10 million obs should be no problem)

proc sql;
create table inter as
select
  *,
  ((var1_t + var2_t + var3_t) - (var1_c + var2_c + var3_c)) as score
from
  treatment, control
;
quit;

Now sort and select

proc sort data=inter;
by name_t score;
run;

data want;
set inter;
by name_t;
if first.name_t;
run;

One can easily integrate the sort and select into the SQL, but I wanted to keep the steps simple

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 19

Re: HELP NEEDED ASAP: MACRO / Iterative "DO loop" to create a Matched Sample

Thanks so much! I think this will work. 

 

Follow-up question: Suppose I have a categorical variable that acts as an additional criteria besides the score. Say, gender. 

So, I only want the matched pairs to be both male, or both female.

 

At what point in your proposed code should I eliminate the combinations where GENDER_T ne GENDER_B ? 

Super User
Posts: 6,972

Re: HELP NEEDED ASAP: MACRO / Iterative "DO loop" to create a Matched Sample

In the SQL, you can add a where condition:

where a.gender_t = gender_t

You must of course include the gender in the keeps of the first data step.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 3 replies
  • 384 views
  • 0 likes
  • 2 in conversation