BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sasinitialuser
Calcite | Level 5

hello.
I am a SAS beginner.

I have an experimental group, and I need to build a control group.
A control group must be created by matching control group subjects 1:1 based on two variables: age and sex of the experimental group.

Matching should be done using only those two variables, not the propensity score.

Should I use proc psmatch or inner join?
I would appreciate it if you could provide detailed code.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
And if there is no match to your experimental group you'll drop the experimental case? That is what an inner join would do.

View solution in original post

5 REPLIES 5
ballardw
Super User

Here is one way. This uses a Data step merge, which is different than an inner join.

You did not provide any example data so this a small dummy example. The possible controls do contain sex age combinations that do not appear in the example experimental group.

 

/* generate a possible list of controls */
data work.poss;
   do CID = 100 to 1000;
      sex = rand('BERNOULLI',.5)+1;
      age = rand('integer',14,18);
      randomizer = rand('uniform');
      output;
   end;
   label CID='ID in control set';
run;

/* randomize the order of the possible
   controls
*/
proc sort data=work.poss;
   by sex age randomizer;
run;

/* "experimental" set */
data work.exp;
   input id sex age;
datalines;
1 1 15 
2 1 15 
3 1 15 
6 1 16 
4 2 15 
5 2 15 
7 2 16 
8 2 16
;

proc sort data=work.exp;
   by sex age ;
run;

data work.want;
   merge work.poss (in=in1)
         work.exp (in=in2)
   ;
   by sex age;
   lid=lag(id);
   if in2;
   if id=lid then delete;
   drop lid randomizer;
run;

The way a match merge with BY variables works when there are multiples of the By variables in both data sets is the last observation of the smaller group of BY values will get matched with all the remaining ones from the other set. So we use the LAG function to identify the previous ID of the experimental group (big assumption that is smaller than the possible controls ). If the ID repeats then remove. The IN= option provides a temporary variable that indicates whether that data set contributed to the current observation that is 1/0 valued for true/false. We only select the ones that have a contribution from the experimental group with the "if in2;"

 

This does include a randomization for the possible control matches.

Reeza
Super User
Inner join seems like an interesting option as that could drop experimental cases which is a tad off. Usually the matching algorithms will be adjusted instead if no matches are found rather than dropping but if that's what you're after, inner join is correct.
A left join is the typical method. What would you want to do with multiple matches?
sasinitialuser
Calcite | Level 5

I want to create a control group that is the same or similar in age and gender to the experimental group.

Reeza
Super User
And if there is no match to your experimental group you'll drop the experimental case? That is what an inner join would do.
sasinitialuser
Calcite | Level 5
Yes I'll drop some of experimental group.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2053 views
  • 0 likes
  • 3 in conversation