BookmarkSubscribeRSS Feed
Calcite | Level 5

I am a SAS beginner.

I have an experimental group, and I need to build a control group.
A control group must be created by matching control group subjects 1:1 based on two variables: age and sex of the experimental group.

Matching should be done using only those two variables, not the propensity score.

Should I use proc psmatch or inner join?
I would appreciate it if you could provide detailed code.

Super User

Here is one way. This uses a Data step merge, which is different than an inner join.

You did not provide any example data so this a small dummy example. The possible controls do contain sex age combinations that do not appear in the example experimental group.


/* generate a possible list of controls */
data work.poss;
   do CID = 100 to 1000;
      sex = rand('BERNOULLI',.5)+1;
      age = rand('integer',14,18);
      randomizer = rand('uniform');
   label CID='ID in control set';

/* randomize the order of the possible
proc sort data=work.poss;
   by sex age randomizer;

/* "experimental" set */
data work.exp;
   input id sex age;
1 1 15 
2 1 15 
3 1 15 
6 1 16 
4 2 15 
5 2 15 
7 2 16 
8 2 16

proc sort data=work.exp;
   by sex age ;

data work.want;
   merge work.poss (in=in1)
         work.exp (in=in2)
   by sex age;
   if in2;
   if id=lid then delete;
   drop lid randomizer;

The way a match merge with BY variables works when there are multiples of the By variables in both data sets is the last observation of the smaller group of BY values will get matched with all the remaining ones from the other set. So we use the LAG function to identify the previous ID of the experimental group (big assumption that is smaller than the possible controls ). If the ID repeats then remove. The IN= option provides a temporary variable that indicates whether that data set contributed to the current observation that is 1/0 valued for true/false. We only select the ones that have a contribution from the experimental group with the "if in2;"


This does include a randomization for the possible control matches.

Super User
Inner join seems like an interesting option as that could drop experimental cases which is a tad off. Usually the matching algorithms will be adjusted instead if no matches are found rather than dropping but if that's what you're after, inner join is correct.
A left join is the typical method. What would you want to do with multiple matches?
Calcite | Level 5

I want to create a control group that is the same or similar in age and gender to the experimental group.

Super User
And if there is no match to your experimental group you'll drop the experimental case? That is what an inner join would do.
Calcite | Level 5
Yes I'll drop some of experimental group.


Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 3 in conversation