Re: Selecting the closest 4 propensity scores as controls

SMcelroy1287 · Posted 05-13-2017 02:11 PM

Hello All!

I have two data sets one which contains all of my cases and looks similar to this: It contains if the outcome occurred and since these are all cases the outcome variable is always 1, the date on which the outcome occurred, the zip code, and then the propensity score(PS) calculated previously: (n=45,000 named CAL.PS_CASES1)

OUTCOME OUTCOME_DATE ZIPCODE PS Id_t

1 1/01/05 33333 .5680 32

1 03/26/09 55555 .0337 6589

The second data set contains environmental data for each day by zipcode and also has a propensity score that was previously calculated. The data looks like this:

(n=1,530,000 named CAL.PS_CONT1 )

OUTCOME_DATE ZIPCODE PS TEMP .... ...MEANNO2 Id_t

1/01/05 33333 .0483 78 92 1

1/01/05 55555 .2569 85 72 2

03/26/09 55555 .7802 65 80 3

I am trying to select the 4 closest propensity scores, from the environmental data, for each one of my cases and call them controls. These four controls matched on closest propensity scores must be in a range of plus or minus 30 days from the outcome date of the specific case. I am trying to use proc sql to select these controls. This is the start of my code... I do not know how to specify selecting 4 controls per case instead of just one. I also am not sure if I need a right join statement. Any help would be greatly appreciated. Thank you in advance!

PROC SQL;
create table PS_MATCH as
SELECT a.*,b.*,
ABS(A.OUTCOME_DATE-B.OUTCOME_DATE) AS ABSDATE

A.PS-B.PS AS PROP_SCORE,
ABS(CALCULATED PROP_SCORE) AS ABSPROP,
FROM CAL.PS_CASES1 AS A,
CAL.PS_CONT1 AS B
WHERE ABSDATE <= 30
GROUP BY ID_T;
QUIT;

Reeza · Posted 05-13-2017 02:14 PM

Can controls be used multiple times in a match? Essentially a greedy match algorithm.

For starters you can look for a Mayo Clinic Propensity score matching?

SMcelroy1287 · Posted 05-13-2017 02:25 PM

Yes, controls can be matched to multiple cases if it satisfies the conditions of closest propensity score and it is within the 60-day range of the case date.

Reeza · Posted 05-13-2017 03:06 PM

So assuming your code does find all 'valid' matches you can use SAS steps after to pick the closest 4.

proc sort data=have;
by caseID controlID descending matchScore;
run;

data want;
set have;
by caseID controlID;

if first.caseID then count=0;
count+1;

if count <= 4;

run;

SMcelroy1287 · Posted 05-13-2017 04:44 PM

Thank you for responding! I have not been able to generate my possible controls yet, but this code does look useful for when I do.

Reeza · Posted 05-13-2017 04:46 PM

I had assumed your SQL was working, but if it's not...if you post data we can work with we can help with code. Working off two records isn't going to work.

If you search on here you'll find instructions on how to post code a data step.

SMcelroy1287 · Posted 05-13-2017 05:31 PM

Ok, I have uploaded subsets of my data only including the variables I would like to select on. Preterm =outcome, nbthdate =outcome date, and ps=propensity score

art297 · Posted 05-13-2017 03:37 PM

Just one possible correction I'd make to @Reeza's suggestion. If what she called matchscore is actually your absprop, I think you would want to leave off the DESCENDING option on your proc sort, so that the first one will be the one with the lowest absprop.

Art, CEO, AnalystFinder.com

Reeza · Posted 05-13-2017 03:42 PM

@art297 is correct, I was assuming that a higher score is better in the code used above. If that's not true then remove the DESCENDING.

SMcelroy1287 · Posted 05-13-2017 04:43 PM

Thank you for your suggestion. Yes, I would want the smallest absprop scores so descending here would not work. I have not been able to figure out the first part of my code to generated the possible controls. Do you have any better ideas?

art297 · Posted 05-13-2017 05:33 PM

Without seeing example data and specifically what you want your resulting file to look like, we can't be sure. However, you should be able to modify the following:

PROC SQL;
  create table PS_MATCH as
    SELECT a.*,
           b.outcome_date as boutcome_date format=date9.,
      ABS(A.OUTCOME_DATE-B.OUTCOME_DATE) AS ABSDATE,
      A.PS-B.PS AS PROP_SCORE,
      b.ps as bps,
      ABS(CALCULATED PROP_SCORE) AS ABSPROP
        FROM CAL.PS_CASES1 AS A,
          CAL.PS_CONT1 AS B
           WHERE a.id_t eq b.id_t and
                 a.zipcode eq b.zipcode and
                 ABSDATE <= 30
  ;
QUIT;

proc sort data=ps_match;
  by id_t absprop;
run;

data want (drop=count);
  set ps_match;
  by id_t;
  if first.id_t then count=0;
  count+1;
  if count <= 4;
run;

Art, CEO, AnalystFinder.com

SMcelroy1287 · Posted 05-13-2017 06:40 PM

Thank you for your response! I tried running the code and I am receiving this ERROR: The following columns were not found in the contributing tables: ABSDATE. I have attached my log below.

889   PROC SQL;
890    create table PS_MATCH as
891      SELECT a.*,
892             b.nbthdate as bnbthdate format=date9.,
893        A.nbthdate-B.nbthdate AS days,
894        abs(calculated days) as absdate,
895        A.PS-B.PS AS PROP_SCORE,
896        b.ps as bps,
897        ABS(CALCULATED PROP_SCORE) AS ABSPROP
898          FROM CAL.PS_CASES1 AS A,
899            CAL.PS_CONT1 AS B
900             WHERE a.id_t=b.id_t and ABSDATE <= 30
901    ;
ERROR: The following columns were not found in the contributing tables: ABSDATE.
902  QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds



903  proc sort data=ps_match;
904    by id_t absprop;
905  run;

SMcelroy1287 · Posted 05-13-2017 06:43 PM

I have attached a subset of my data in .txt format earlier in the feed if you need to work with actual data.

art297 · Posted 05-13-2017 07:06 PM

Can't test your code since id_t isn't in your controls file.

However, try adding calculated before absdate. i.e.,

WHERE a.id_t=b.id_t and calculated ABSDATE <= 30

Art, CEO, AnalystFinder.com

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away