BookmarkSubscribeRSS Feed
wriccar
Fluorite | Level 6

Hello fellow SAS users, 

 

I hope someone can provide me with a bit of help. I am using SAS 9.3 and need to do something that is farily simple to understand, and I can certainly code it for a single iteration, but I need to do it many, many times.

 

Please let me explain, simplifying the data set without sacrificing on the logic. 

 

I have a dataset with "TREATMENT" and "CONTROL" observations for which I am trying to form matched pairs based on a "penalty score," computed using data items that I already have. 

 

I have created a data set that looks something like this, for simplicity:

NAME_T     ID_T     VAR1_T     VAR2_T     VAR3_T     NAME_C     ID_C     VAR1_C     VAR2_C     VAR3_C     

 

NAME and ID are just identifying information;

VAR1, VAR2, VAR3 = the data on which I will create the penalty score; 

_T = populated for treatment observations;

_C = populated for control observations;

(There is also a somewhat redundant variable, TREAT, which is =0,1 for control/treatment observations)

 

Suppose that I have 1,000 TREATMENT participants, and 10,000 possible CONTROL participants included in the DATA file named "DATA_ALL" 

 

Below, I have colored the steps in blue for which I am unable to execute a code, with notes. For other steps, I have annotated briefly. 

Essentially, I need to do the following: 

1a. Create a dataset with only 1 TREATMENT participant; call this SINGLE_TREATMENT

(How do I reference ID=1 through 1,000 in single iterations?)

 

1b. Create a dataset with ALL control participants

DATA CONTROL;  

SET DATA_ALL;

IF TREAT=1 THEN DELETE;

RUN;

 

2. Merge the two using ("if _n_=1 then set SINGLE_TREATMENT") such that the single treatment participant is "matched" with each available control participant

DATA CONTROL_MATCHED;

IF _N_=1 THEN SET SINGLE_TREATMENT;

RUN;

 

3. Compute the "penalty score," for simplicity SCORE = (VAR1_T + VAR2_T + VAR3_T) - (VAR1_C + VAR2_C + VAR3_C), for each pair of participants

DATA CONTROL_MATCHED;

IF _N_=1 THEN SET SINGLE_TREATMENT;

SCORE = (VAR1_T + VAR2_T + VAR3_T) - (VAR1_C + VAR2_C + VAR3_C)

RUN;

 

4. Sort the dataset by ascending "penalty score" so that the lowest value of SCORE is at the top

PROC SORT DATA=CONTROL_MATCHED ASCENDING; 

BY SCORE; 

RUN;

 

***Steps 1 through 4 I can easily code, it is the repitition in Steps 5 through 8 that is difficult for me***

 

5. Delete all except the first observation, which will keep the pair with the lowest penalty score

 

6. Create a new DATA file for this matched pair

 

7. Repeat this for each TREATMENT participant, 1,000 times

 

8. Merge all data sets created in Step 6, such that the final output has 1,000 observations (the number of TREATMENT participants), with each observation having identifying information for the best matched CONTROL observation

 

ANY HELP IS GREATLY APPRECIATED! THANK YOU IN ADVANCE!!!

3 REPLIES 3
Kurt_Bremser
Super User

First, create datasets for treatment and control participants:

data
  treatment (keep=name_t id_t var1_t var2_t var3_t)
  control (keep=name_c id_c var1_c var2_c var3_c)
;
set data_all;
if treat = 1
then output treatment;
else output control;
run;

Then build a cartesian product (10 million obs should be no problem)

proc sql;
create table inter as
select
  *,
  ((var1_t + var2_t + var3_t) - (var1_c + var2_c + var3_c)) as score
from
  treatment, control
;
quit;

Now sort and select

proc sort data=inter;
by name_t score;
run;

data want;
set inter;
by name_t;
if first.name_t;
run;

One can easily integrate the sort and select into the SQL, but I wanted to keep the steps simple

wriccar
Fluorite | Level 6

Thanks so much! I think this will work. 

 

Follow-up question: Suppose I have a categorical variable that acts as an additional criteria besides the score. Say, gender. 

So, I only want the matched pairs to be both male, or both female.

 

At what point in your proposed code should I eliminate the combinations where GENDER_T ne GENDER_B ? 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1698 views
  • 0 likes
  • 2 in conversation