Re: Pre and post diagnosis date

CathyVI

Hello,

I have a data where some ID have multiple records and some only have one record. I have created a variable for first ischemic stroke and first AFIB [ i don't know how to let this variable be in dates instead of 0,1]. Some records have ischemic stroke and AFIB on a different date while some have both stroke and afib dx on the same date as seen below.

I would like to find records that if date of AFIB comes before the date of Ischemic stroke, That is

If AFIB (dx_date) is less than stroke (d_event) then stroke = “Pre”

If AFIB (dx_date) comes after stroke, then stroke = “Post” ;

Else if both dates are the same then stroke = “same day”.

I tried to code in sas but am not getting it.

data have;
input  ID $ D_EVENT First_stroke  First_afib  dx_date ;
format d_event dx_date date9.;
datalines;	
2411312	06APR2010	.	 1	 06APR2010	     
2411312	06APR2010	1	 .	    .	
2094284	28OCT2016	.	 1      28OCT2016
2094284	19FEB2019	1	 .	    .
2303167	23APR2010	.	 1	 23APR2010
2224656	05JUL2018	.	 1	 05JUL2018	
2224656	20APR2005	1	 .	    .
2188033	02JAN2019	.	1	 02JAN2019	
2188033	23APR2012	1	.	    .		
;;
run;

data want;
do until(last.id);
set have;
length stroke 10.;
format dx_date date9.;
by id;
if first_afib then do;
dx_date=d_event;
end;
end;
if dx_date < d_event then stroke = 'Pre';
    else if dx_date> d_event then stroke='Post';
	else stroke='Same day';
run;

This is what I wanted as output

ID	D_EVENT	First_stroke_date	First_AFIB_date	Stroke
2411312	06APR2010	06APR2010	06APR2010	Same day
2094284	28OCT2016	19FEB2019	28OCT2016	Pre
2303167	23APR2010	23APR2010	23APR2010	Same day
2224656	05JUL2018	20APR2005	05JUL2018	Post

Thank you in advance.

ChrisNZ

1. Your code does not run. Please provide running code => Test it after posting by copying from your post.

2. Unsure what you mean by

i don't know how to let this variable be in dates instead of 0,1]

Maybe you don't need a flag at all? Having a date is enough to identify whether an event took place.

Also unsure why STROKE_DTE is always populated.

3. This seems to do what you want. It's only valid if there are 2 records at most per ID.

data HAVE(index=(ID));
  input ID $ STROKE_DTE STROKE_FLG AFIB_FLG AFIB_DTE ;
  format STROKE_DTE AFIB_DTE date9.;
  informat STROKE_DTE AFIB_DTE date9.;
datalines;  
2411312 06APR2010 . 1 06APR2010       
2411312 06APR2010 1 . . 
2094284 28OCT2016 . 1 28OCT2016
2094284 19FEB2019 1 . .
2303167 23APR2010 . 1 23APR2010
2224656 05JUL2018 . 1 05JUL2018  
2224656 20APR2005 1 . .
2188033 02JAN2019 . 1 02JAN2019  
2188033 23APR2012 1 . .   
run;

data WANT;
  merge HAVE(keep=ID STROKE_DTE STROKE_FLG where=(STROKE_FLG))
        HAVE(keep=ID AFIB_DTE   AFIB_FLG   where=(AFIB_FLG  ))
        ;
  by ID;
  length TIMING $8;
  TIMING = ifc(STROKE_DTE < AFIB_DTE, 'Pre'
          ,ifc(STROKE_DTE > AFIB_DTE, 'Post'
          ,                           'Same day'));
run;

High-Performance SAS Coding - Third Edition

CathyVI

@ChrisNZ Thank you. If i have a recurrent event for each of the disease. I have modified your table by adding additional roles to each record. How do I flag only the first and second recurrent date? I want to keep the first record and the first recurrent of each disease.

For example, I want to keep 2188033 23APR2012 1 . .
2188033 02JAN2019 . 1 02JAN2019

data HAVE(index=(ID));
  input ID $ STROKE_DTE STROKE_FLG AFIB_FLG AFIB_DTE ;
  format STROKE_DTE AFIB_DTE date9.;
  informat STROKE_DTE AFIB_DTE date9.;
datalines;  
2411312 06APR2010 . 1 06APR2010 
2411312 10JUN2010 . 1 10JUN2010      
2411312 06APR2010 1 . . 
2094284 28OCT2016 . 1 28OCT2016
2094284 19FEB2019 1 . .
2303167 23APR2010 . 1 23APR2010
2224656 05JUL2018 . 1 05JUL2018  
2224656 20APR2005 1 . .
2188033 23APR2012 1 . .  
2188033 02JAN2019 . 1 02JAN2019 
2188033 01JUN2019 . 1 01JUN2019
2188033 11JUL2020 . 1 11JUL2020  
run;

ChrisNZ

So you want to keep the first 2 records?

Shouldn't your table be sorted by ID and by STROKE_DTE to ensure these records are always in the expected order?

High-Performance SAS Coding - Third Edition

CathyVI

@ChrisNZ You are right. I am suppose to sort by ID and STROKE_DTE so I can see all order of events.

I have converted the FLG into dates since they are in dates format in my original data. Also I have other variables in the original data that i will like to add so I can find all first diagnosis(dx) and recurrent dx. This is how my current data looks and this is what I want as output.

data HAVE(index=(ID));
  input ID $ EVENT_DTE STROKE_DTE R_STROKE_DTE AFIB_DTE R_AFIB_DTE DEMENTIA_DTE R_DEMENTIA_DTE PFO_DTE R_PFO_DTE SEIZURE_DTE R_SEIZURE_DTE ;
  format EVENT_DTE STROKE_DTE R_STROKE_DTE AFIB_DTE R_AFIB_DTE DEMENTIA_DTE R_DEMENTIA_DTE PFO_DTE R_PFO_DTE SEIZURE_DTE R_SEIZURE_DTE date9.;
  informat EVENT_DTE STROKE_DTE R_STROKE_DTE AFIB_DTE R_AFIB_DTE DEMENTIA_DTE R_DEMENTIA_DTE PFO_DTE R_PFO_DTE SEIZURE_DTE R_SEIZURE_DTE date9.;
datalines;  
2411312 05APR2010 05APR2010 . . . . . . . . .
2411312 06APR2010 . 06APR2010 . . . . . . . .
2411312 07APR2010 . 07APR2010 . . . . . . . .
2411312 08MAY2010 . 08MAY2010 . . . . . . . .
2411312 10JUN2010 . 10JUN2010 . . . . . . . .
2094284 28OCT2016 . . 28OCT2016 . . . . . . .
2094284 19FEB2019 . . . 19FEB2019 . . . . . .
2094284 23MAR2019 . . . 23MAR2019 . . . . . .
2094284 19APR2019 . . . 19APR2019 . . . . . .
2094284 29MAY2019 . . . 29MAY2019 . . . . . .
2094284 10JUL2019 . . . 10JUL2019 . . . . . .
2094284 19SEP2019 . . . 19SEP2019 . . . . . .
2094284 10FEB2020 . . . 10FEB2020 . . . . . .
2303167 23APR2010 . . . . 23APR2010 . . . . .
2224656 01APR2005 01APR2005 . . . . . . . . .
2224656 05JUL2006 . . . . . . 05JUL2006 . . .
2224656 15JUL2006 . . . . . . . 15JUL2006 . .
2224656 20APR2015 . . . . . . . 20APR2015 . .
2224656 29APR2015 . . . . . . . 29APR2015 . .
2188033 02JAN2019 02JAN2019 . . . . . . . . .
2188033 23APR2012 . 23APR2012 . . . . . . . .
2188033 23APR2013 . . . . . . . . 23APR2013 .
2188033 25MAY2013 . . . . . . . . . 25MAY2013
;
run;

OUTPUT: I will like all first diagnosis (dx) and recurrent dx to be in one row per ID.

For example

ID

STROKE_DTE

R_STROKE_DTE 1

R_STROKE_DTE 2

R_STROKE_DTE 3

R_STROKE_DTE 4

AFIB_DTE

R_AFIB_DTE 1

R_AFIB_DTE 2

R_AFIB_DTE 3

R_AFIB_DTE 4

R_AFIB_DTE 5

R_AFIB_DTE 6

R_AFIB_DTE 7

DEMENTIA_DTE

2411312

05APR2010

06APR2010

07APR2010

08MAY2010

10JUN2010

2094284

28OCT2016

19FEB2019

23MAR2019

19APR2019

29MAY2019

10JUL2019

19SEP2019

10FEB2020

ChrisNZ

You're moving the goal posts. 😞

There's no timing in your output, and you're transforming a long data set into a wide one with no obvious logic.

I have no idea what you want. Do you?

High-Performance SAS Coding - Third Edition

Pre and post diagnosis date