BookmarkSubscribeRSS Feed
skm001
Calcite | Level 5

Hi all,

I've been using a nearest neighbor without replacement propensity score matching macro and have some questions about it. Some background information: I'm working with a large control data set and a smaller treatment data set. Both data sets are subsetted by year. Both the comparison and the treatment data sets have already had propensity scores calculated. The issue is that the the number of treatment records inputted into the macro doesn't equal the number of treatment records outputted from the macro. For example with 2010 treatment: there are a total of 11,482 unique treatment records inputted but only 11,286 treatment records are matched to the 2010 unique control records. Can anyone tell me why I'm losing records?

The second question revolves around a portion of the macro itself. The macro I used for my PSMs is found at http://home.uchicago.edu/~mcoca/docs/PSmatching.sas. I'm particularly interested in this part:

/* Open the treatment */

            set 2010_treatement;

            %if %upcase(&method) ~= RADIUS %then %do;

            retain BestDistance 99;

            %end;

/* Iterate over the hash */

            rc=iter.first();

            if (rc=0) then BestDistance=99;

            do while (rc=0);

/* Caliper */

            %if %upcase(&method) = CALIPER %then %do;

            if (pscoreT - %caliper) <=pscoreC <= (pscoreT + &caliper) then do;

                      ScoreDistance=abs(pscoreT-pscoreC);

                      if ScoreDistance < BestDistance then do;

                                BestDistance=ScoreDistance;

                                IDSelectedControl=idC;

                                PScoreControl=pscoreC;

                                MatchedToTreatID=idT;

                                PScoreTreat=pscoreT;

                      end;

                      rc=iter.next();

                      if (rc~=0) then do;

                                output;

                                rc1=h.remove(key:IdSelectedControl);

                      end;

          end;

  %end;

What is the purpose of BestDistance? Is that identifying the closest and best matching control record? And if I remove the BestDistance=99 portion, will that help me match the lost treatment records if I rerun the PSM?

If not, is there an easy way to alter the macro to insure that all inputted treatment records are matched to control records?

Thanks!

3 REPLIES 3
Tom
Super User Tom
Super User

The 99 is intended to be a "really big number" so that any actual distance will be less than that.

Which of the various METHOD settings did you use?  For example if you use METHOD=CALIPER then the distance must be within +/- the &CALIPER value.

skm001
Calcite | Level 5

Hi Tom,

I didn't set a caliper value in the macro statement:

%macro PSMatching(datatreatment=2010_treatment, datacontrol=2010_control, method=NN, numberofcontrols=2, caliper=, replacement=no, out=beaconated_matched);

Based on your input, I'm starting to think that distance wasn't a factor in the dropped treatment records. Maybe the macro couldn't find matches for the 2010 treatment records from the 2010 comparison group?

Tom
Super User Tom
Super User

Did you check that your cases do not have missing values for the propensity score?

Did you try with replacement?  Does that yield a control for every case?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1861 views
  • 0 likes
  • 2 in conversation