BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fengyuwuzu
Pyrite | Level 9

I have each subject's series of exposure date and also series of adverse event date, and want to compare and find the last exposure date prior to each adverse event. For example, exposure dates are   in the first two columns, adverse events dates are in the third and forth columns, and I want to get the last column for each adverse event. Can anyone give me some hints? Thanks

 

The first two columns are in one dataset (EX), and the 3rd and 4th columns are in the second dataset (AE); the 5th column is the column I want to add to have. I put them together for convenience. 

 

ID EX_date ID  AE_date Last EX_date before AE_date
1 8-Mar-16 1 9-Mar-16 8-Mar-16
1 22-Mar-16 1 9-Mar-16 8-Mar-16
1 12-Apr-16 1 9-Mar-16 8-Mar-16
1 26-Apr-16 1 4-Mar-16  
2 30-Aug-16 1 20-Mar-16 8-Mar-16
2 16-Sep-16 1 9-Mar-16 8-Mar-16
2 30-Sep-16 1 3-Apr-16 22-Mar-16
2 14-Oct-16 1 26-Apr-16 12-Apr-16
2 9-Nov-16 1 26-Apr-16 12-Apr-16
3 13-Jul-16 1 20-May-16 26-Apr-16
3 27-Jul-16 1 16-May-16 26-Apr-16
3 14-Sep-16 2 12-Sep-16 30-Aug-16
3 12-Oct-16 2 12-Sep-16 30-Aug-16
3 26-Oct-16 2 28-Oct-16 14-Oct-16
3 9-Nov-16 2 28-Oct-16 14-Oct-16
4 19-Oct-16 3 27-Jul-16 13-Jul-16
4 2-Nov-16 3 14-Sep-16 27-Jul-16
4 16-Nov-16 3 12-Aug-16 27-Jul-16
4 30-Nov-16 3 10-Aug-16 27-Jul-16
5 30-Jun-16 3 14-Sep-16 27-Jul-16
5 14-Jul-16 3 19-Oct-16 12-Oct-16
5 20-Aug-16 4 5-Dec-16 30-Nov-16
5 13-Sep-16 4 15-Dec-16 30-Nov-16
5 27-Sep-16 4 15-Dec-16 30-Nov-16
5 11-Oct-16 4 25-Nov-16 30-Nov-16
5 25-Oct-16 4 13-Dec-16 30-Nov-16
5 8-Nov-16 4 27-Dec-16 30-Nov-16
5 22-Nov-16 5 19-Jul-16 14-Jul-16
5 6-Dec-16 5 16-Aug-16
5 27-Dec-16 5 30-Aug-16  
6 10-Jan-17 5 30-Aug-16  
6 1-Dec-16 5 13-Sep-16  
6 10-Nov-16 5 28-Sep-16  
6 28-Nov-16 5 28-Sep-16  
6 13-Dec-16 5 28-Sep-16  
6 28-Dec-16 5 28-Sep-16  
    5 12-Oct-16  
    5 16-Oct-16  
    5 14-Oct-16  
    5 31-Oct-16  
    6 10-Jan-17  
    6 10-Jan-17  
1 ACCEPTED SOLUTION

Accepted Solutions
Sven111
Pyrite | Level 9

Assuming there are two separate tables, one for the Exposure date and one for the Adverse effects, you could create the fifth column like this:

PROC SQL;
    SELECT  tbl_A.ID
           ,tbl_A.AE_Date
           ,(SELECT MAX(tbl_B.EX_Date)
             FROM   tbl_Exposure    AS tbl_B
             WHERE  tbl_A.ID         = tbl_B.ID
               AND  tbl_A.AE_Date   >= tbl_B.Exp_Date)
    FROM    tbl_AdvEff              AS tbl_A;
;QUIT;

View solution in original post

11 REPLIES 11
mkeintz
PROC Star

Is your data sorted by id?  Your sample is not.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
fengyuwuzu
Pyrite | Level 9
they are in different datasets. I just want to give some sample data, and put them together.
mkeintz
PROC Star

Assuming you have dataset EXDATES, sorted by ID and ex_date,  and dataset AEDATES, sorted by ID and ae_date, then you can use the SET ... BY combination of statements to interleave the two datasets.  You can retain last_exdate every time an ex_dates observation is read, and you can use the subsetting IF to only keep records read from AEDATES:

 

data exdates;
  input ID EX_date :date9.;
  format ex_date yymmddn8.;
datalines;
1 	8mar2016 	
1 	22mar2016 	
1 	12Apr2016 
1 	26Apr2016 
2 	30Aug2016 	
2 	16Sep2016 	
2 	30Sep2016 	
2 	14Oct2016 	
2 	9Nov2016 	
3 	13Jul2016 	
3 	27Jul2016 	
3 	14Sep2016 	
3 	12Oct2016 	
3 	26Oct2016 	
3 	9Nov2016 	
4 	19Oct2016 	
4 	2Nov2016 	
4 	16Nov2016 	
4 	30Nov2016 	
5 	30Jun2016 	
5 	14Jul2016 	
5 	20Aug2016 	
5 	13Sep2016 	
5 	27Sep2016 	
5 	11Oct2016 	
5 	25Oct2016 	
5 	8Nov2016 	
5 	22Nov2016 	
5 	6Dec2016 	
5 	27Dec2016 	
6 	10Jan17 	
6 	1Dec2016 	
6 	10Nov2016 	
6 	28Nov2016 	
6 	13Dec2016 	
6 	28Dec2016 	
run;
proc sort ; by id ex_date;run;

data aedates;
  input ID AE_date :date9.	;
  format ae_date yymmddn8.;
datalines;
    1       9mar2016
    1       9mar2016
    1       9mar2016
    1       4mar2016
    1       20mar2016
    1       9mar2016
    1       3Apr2016
    1       26Apr2016
    1       26Apr2016
    1       20May2016
    1       16May2016
    2       12Sep2016
    2       12Sep2016
    2       28Oct2016
    2       28Oct2016
    3       27Jul2016
    3       14Sep2016
    3       12Aug2016
    3       10Aug2016
    3       14Sep2016
    3       19Oct2016
    4       5Dec2016
    4       15Dec2016
    4       15Dec2016
    4       25Nov2016
    4       13Dec2016
    4       27Dec2016
    5       19Jul2016
    5       16Aug2016
    5       30Aug2016
    5       30Aug2016
    5       13Sep2016
    5       28Sep2016
    5       28Sep2016
    5       28Sep2016
    5       28Sep2016
    5       12Oct2016
    5       16Oct2016
    5       14Oct2016
    5       31Oct2016
    6       10Jan2017
    6       10Jan2017
run;
proc sort; by id ae_date;run;

data want;
  set exdates (in=inex rename=(ex_date=ae_date)) 
      aedates (in=inae);
  by id ae_date;

  retain last_exdate;
  if first.id then last_exdate=.;
  if inex then last_exdate=ae_date;
  if inae;
  format last_exdate yymmddn8.;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
fengyuwuzu
Pyrite | Level 9

Thank you, 

Sven111
Pyrite | Level 9

I always like seeing a DATA step approach for problems like this since SQL is more of my strong suit and always my first take on a solution. Seeing the same problem solved using both helps me see the problem from a different perspective and learn a bit more.

mkeintz
PROC Star

As long as the datasets are not large, I think you'll be happy with the performance of sql.  But it is probably not as scalable as the data step approach, especially if your datasets are already sorted.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Sven111
Pyrite | Level 9

Assuming there are two separate tables, one for the Exposure date and one for the Adverse effects, you could create the fifth column like this:

PROC SQL;
    SELECT  tbl_A.ID
           ,tbl_A.AE_Date
           ,(SELECT MAX(tbl_B.EX_Date)
             FROM   tbl_Exposure    AS tbl_B
             WHERE  tbl_A.ID         = tbl_B.ID
               AND  tbl_A.AE_Date   >= tbl_B.Exp_Date)
    FROM    tbl_AdvEff              AS tbl_A;
;QUIT;
fengyuwuzu
Pyrite | Level 9

Thank you. This is a good clue. 

I am thinking about left join

Sven111
Pyrite | Level 9

That would work too, it's largely a matter of preference on small datasets.  On bigger datasets there may be some performance implications for choosing one way or another, at least when passing through to database, I'm not as experienced at pure PROC SQL performance tuning as I am at pass-through stuff.

art297
Opal | Level 21

@Sven111: Just in case you hadn't noticed, neither of the proposed solutions give you exactly what you asked for. To correct for that, given the SQL solution, change the line:

AND  tbl_A.AE_Date   >= tbl_B.Exp_Date

to

AND  tbl_A.AE_Date   > tbl_B.Exp_Date

 HTH,

Art, CEO, AnalystFinder.com

 

fengyuwuzu
Pyrite | Level 9
Thank you very much, Art. Yes, you are correct, and I wanted to use ">". I did change it when applied the solution.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2181 views
  • 2 likes
  • 4 in conversation