BookmarkSubscribeRSS Feed
SMcelroy1287
Obsidian | Level 7

Hello All! 

I have two data sets one which contains all of my cases and looks similar to this: It contains if the outcome occurred and since these are all cases the outcome variable is always 1, the date on which the outcome occurred, the zip code, and then the propensity score(PS) calculated previously: (n=45,000 named CAL.PS_CASES1)

OUTCOME  OUTCOME_DATE  ZIPCODE   PS         Id_t

1                        1/01/05               33333         .5680       32

1                       03/26/09              55555         .0337       6589

 

The second data set contains environmental data for each day by zipcode and also has a propensity score that was previously calculated. The data looks like this:

(n=1,530,000 named CAL.PS_CONT1 )

 

OUTCOME_DATE         ZIPCODE     PS    TEMP .... ...MEANNO2    Id_t

1/01/05                          33333           .0483       78             92                 1

1/01/05                           55555           .2569       85             72                 2

03/26/09                         55555          .7802        65             80                 3

 

I am trying to select the 4 closest propensity scores, from the environmental data, for each one of my cases and call them controls. These four controls matched on closest propensity scores must be in a range of plus or minus 30 days from the outcome date of the specific case. I am trying to use proc sql to select these controls. This is the start of my code... I do not know how to specify selecting 4 controls per case instead of just one. I also am not sure if I need a right join statement. Any help would be greatly appreciated. Thank you in advance!

 

PROC SQL;
create table PS_MATCH as
SELECT a.*,b.*,
ABS(A.OUTCOME_DATE-B.OUTCOME_DATE) AS ABSDATE

A.PS-B.PS AS PROP_SCORE,
ABS(CALCULATED PROP_SCORE) AS ABSPROP,
FROM CAL.PS_CASES1 AS A,
CAL.PS_CONT1 AS B
WHERE ABSDATE <= 30
GROUP BY ID_T;
QUIT;

13 REPLIES 13
Reeza
Super User

Can controls be used multiple times in a match? Essentially a greedy match algorithm.

For starters you can look for a Mayo Clinic Propensity score matching?

SMcelroy1287
Obsidian | Level 7

Yes, controls can be matched to multiple cases if it satisfies the conditions of closest propensity score and it is within the 60-day range of the case date.

Reeza
Super User

So assuming your code does find all 'valid' matches you can use SAS steps after to pick the closest 4.

 

proc sort data=have;
by caseID controlID descending matchScore;
run;

data want;
set have;
by caseID controlID;

if first.caseID then count=0;
count+1;

if count <= 4;

run;

 

SMcelroy1287
Obsidian | Level 7

Thank you for responding! I have not been able to generate my possible controls yet, but this code does look useful for when I do. 

Reeza
Super User

I had assumed your SQL was working, but if it's not...if you post data we can work with we can help with code. Working off two records isn't going to work. 

If you search on here you'll find instructions on how to post code a data step. 

SMcelroy1287
Obsidian | Level 7

Ok, I have uploaded subsets of my data only including the variables I would like to select on. Preterm =outcome, nbthdate =outcome date, and ps=propensity score 

art297
Opal | Level 21

Just one possible correction I'd make to @Reeza's suggestion. If what she called matchscore is actually your absprop, I think you would want to leave off the DESCENDING option on your proc sort, so that the first one will be the one with the lowest absprop.

 

Art, CEO, AnalystFinder.com

 

Reeza
Super User

@art297 is correct, I was assuming that a higher score is better in the code used above. If that's not true then remove the DESCENDING.

SMcelroy1287
Obsidian | Level 7

Thank you for your suggestion. Yes, I would want the smallest absprop scores so descending here would not work. I have not been able to figure out the first part of my code to generated the possible controls. Do you have any better ideas?

art297
Opal | Level 21

Without seeing example data and specifically what you want your resulting file to look like, we can't be sure. However, you should be able to modify the following:

PROC SQL;
  create table PS_MATCH as
    SELECT a.*,
           b.outcome_date as boutcome_date format=date9.,
      ABS(A.OUTCOME_DATE-B.OUTCOME_DATE) AS ABSDATE,
      A.PS-B.PS AS PROP_SCORE,
      b.ps as bps,
      ABS(CALCULATED PROP_SCORE) AS ABSPROP
        FROM CAL.PS_CASES1 AS A,
          CAL.PS_CONT1 AS B
           WHERE a.id_t eq b.id_t and
                 a.zipcode eq b.zipcode and
                 ABSDATE <= 30
  ;
QUIT;

proc sort data=ps_match;
  by id_t absprop;
run;

data want (drop=count);
  set ps_match;
  by id_t;
  if first.id_t then count=0;
  count+1;
  if count <= 4;
run;

Art, CEO, AnalystFinder.com

SMcelroy1287
Obsidian | Level 7

Thank you for your response! I tried running the code and I am receiving this ERROR: The following columns were not found in the contributing tables: ABSDATE. I have attached my log below. 

889   PROC SQL;
890    create table PS_MATCH as
891      SELECT a.*,
892             b.nbthdate as bnbthdate format=date9.,
893        A.nbthdate-B.nbthdate AS days,
894        abs(calculated days) as absdate,
895        A.PS-B.PS AS PROP_SCORE,
896        b.ps as bps,
897        ABS(CALCULATED PROP_SCORE) AS ABSPROP
898          FROM CAL.PS_CASES1 AS A,
899            CAL.PS_CONT1 AS B
900             WHERE a.id_t=b.id_t and ABSDATE <= 30
901    ;
ERROR: The following columns were not found in the contributing tables: ABSDATE.
902  QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds



903  proc sort data=ps_match;
904    by id_t absprop;
905  run;

 

SMcelroy1287
Obsidian | Level 7
I have attached a subset of my data in .txt format earlier in the feed if you need to work with actual data.
art297
Opal | Level 21

Can't test your code since id_t isn't in your controls file.

 

However, try adding calculated before absdate. i.e.,

WHERE a.id_t=b.id_t and calculated ABSDATE <= 30

 

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1314 views
  • 3 likes
  • 3 in conversation