DATA Step, Macro, Functions and more

Selecting the closest 4 propensity scores as controls

Reply
Occasional Contributor
Posts: 15

Selecting the closest 4 propensity scores as controls

Hello All! 

I have two data sets one which contains all of my cases and looks similar to this: It contains if the outcome occurred and since these are all cases the outcome variable is always 1, the date on which the outcome occurred, the zip code, and then the propensity score(PS) calculated previously: (n=45,000 named CAL.PS_CASES1)

OUTCOME  OUTCOME_DATE  ZIPCODE   PS         Id_t

1                        1/01/05               33333         .5680       32

1                       03/26/09              55555         .0337       6589

 

The second data set contains environmental data for each day by zipcode and also has a propensity score that was previously calculated. The data looks like this:

(n=1,530,000 named CAL.PS_CONT1 )

 

OUTCOME_DATE         ZIPCODE     PS    TEMP .... ...MEANNO2    Id_t

1/01/05                          33333           .0483       78             92                 1

1/01/05                           55555           .2569       85             72                 2

03/26/09                         55555          .7802        65             80                 3

 

I am trying to select the 4 closest propensity scores, from the environmental data, for each one of my cases and call them controls. These four controls matched on closest propensity scores must be in a range of plus or minus 30 days from the outcome date of the specific case. I am trying to use proc sql to select these controls. This is the start of my code... I do not know how to specify selecting 4 controls per case instead of just one. I also am not sure if I need a right join statement. Any help would be greatly appreciated. Thank you in advance!

 

PROC SQL;
create table PS_MATCH as
SELECT a.*,b.*,
ABS(A.OUTCOME_DATE-B.OUTCOME_DATE) AS ABSDATE

A.PS-B.PS AS PROP_SCORE,
ABS(CALCULATED PROP_SCORE) AS ABSPROP,
FROM CAL.PS_CASES1 AS A,
CAL.PS_CONT1 AS B
WHERE ABSDATE <= 30
GROUP BY ID_T;
QUIT;

Super User
Posts: 17,750

Re: Selecting the closest 4 propensity scores as controls

Can controls be used multiple times in a match? Essentially a greedy match algorithm.

For starters you can look for a Mayo Clinic Propensity score matching?

Occasional Contributor
Posts: 15

Re: Selecting the closest 4 propensity scores as controls

Yes, controls can be matched to multiple cases if it satisfies the conditions of closest propensity score and it is within the 60-day range of the case date.

Super User
Posts: 17,750

Re: Selecting the closest 4 propensity scores as controls

So assuming your code does find all 'valid' matches you can use SAS steps after to pick the closest 4.

 

proc sort data=have;
by caseID controlID descending matchScore;
run;

data want;
set have;
by caseID controlID;

if first.caseID then count=0;
count+1;

if count <= 4;

run;

 

Occasional Contributor
Posts: 15

Re: Selecting the closest 4 propensity scores as controls

Thank you for responding! I have not been able to generate my possible controls yet, but this code does look useful for when I do. 

Super User
Posts: 17,750

Re: Selecting the closest 4 propensity scores as controls

I had assumed your SQL was working, but if it's not...if you post data we can work with we can help with code. Working off two records isn't going to work. 

If you search on here you'll find instructions on how to post code a data step. 

Occasional Contributor
Posts: 15

Re: Selecting the closest 4 propensity scores as controls

[ Edited ]

Ok, I have uploaded subsets of my data only including the variables I would like to select on. Preterm =outcome, nbthdate =outcome date, and ps=propensity score 

PROC Star
Posts: 7,356

Re: Selecting the closest 4 propensity scores as controls

[ Edited ]

Just one possible correction I'd make to @Reeza's suggestion. If what she called matchscore is actually your absprop, I think you would want to leave off the DESCENDING option on your proc sort, so that the first one will be the one with the lowest absprop.

 

Art, CEO, AnalystFinder.com

 

Super User
Posts: 17,750

Re: Selecting the closest 4 propensity scores as controls

@art297 is correct, I was assuming that a higher score is better in the code used above. If that's not true then remove the DESCENDING.

Occasional Contributor
Posts: 15

Re: Selecting the closest 4 propensity scores as controls

Thank you for your suggestion. Yes, I would want the smallest absprop scores so descending here would not work. I have not been able to figure out the first part of my code to generated the possible controls. Do you have any better ideas?

PROC Star
Posts: 7,356

Re: Selecting the closest 4 propensity scores as controls

Without seeing example data and specifically what you want your resulting file to look like, we can't be sure. However, you should be able to modify the following:

PROC SQL;
  create table PS_MATCH as
    SELECT a.*,
           b.outcome_date as boutcome_date format=date9.,
      ABS(A.OUTCOME_DATE-B.OUTCOME_DATE) AS ABSDATE,
      A.PS-B.PS AS PROP_SCORE,
      b.ps as bps,
      ABS(CALCULATED PROP_SCORE) AS ABSPROP
        FROM CAL.PS_CASES1 AS A,
          CAL.PS_CONT1 AS B
           WHERE a.id_t eq b.id_t and
                 a.zipcode eq b.zipcode and
                 ABSDATE <= 30
  ;
QUIT;

proc sort data=ps_match;
  by id_t absprop;
run;

data want (drop=count);
  set ps_match;
  by id_t;
  if first.id_t then count=0;
  count+1;
  if count <= 4;
run;

Art, CEO, AnalystFinder.com

Occasional Contributor
Posts: 15

Re: Selecting the closest 4 propensity scores as controls

Thank you for your response! I tried running the code and I am receiving this ERROR: The following columns were not found in the contributing tables: ABSDATE. I have attached my log below. 

889   PROC SQL;
890    create table PS_MATCH as
891      SELECT a.*,
892             b.nbthdate as bnbthdate format=date9.,
893        A.nbthdate-B.nbthdate AS days,
894        abs(calculated days) as absdate,
895        A.PS-B.PS AS PROP_SCORE,
896        b.ps as bps,
897        ABS(CALCULATED PROP_SCORE) AS ABSPROP
898          FROM CAL.PS_CASES1 AS A,
899            CAL.PS_CONT1 AS B
900             WHERE a.id_t=b.id_t and ABSDATE <= 30
901    ;
ERROR: The following columns were not found in the contributing tables: ABSDATE.
902  QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds



903  proc sort data=ps_match;
904    by id_t absprop;
905  run;

 

Occasional Contributor
Posts: 15

Re: Selecting the closest 4 propensity scores as controls

I have attached a subset of my data in .txt format earlier in the feed if you need to work with actual data.
PROC Star
Posts: 7,356

Re: Selecting the closest 4 propensity scores as controls

[ Edited ]

Can't test your code since id_t isn't in your controls file.

 

However, try adding calculated before absdate. i.e.,

WHERE a.id_t=b.id_t and calculated ABSDATE <= 30

 

Art, CEO, AnalystFinder.com

 

Ask a Question
Discussion stats
  • 13 replies
  • 152 views
  • 3 likes
  • 3 in conversation