Selecting unique records by two different variables

Reply
New Contributor
Posts: 2

Selecting unique records by two different variables

Hi , I have two data sets which have to merge by id. I want to select those records from patients who stayed at BUR hospital and had a record date  on the day of their MI/or the closest date to their  MI . I need one record per patient that is the most recent time according to the patient's event date and time at the BUR Hospital.

Data set 1                                         

ID   EVENT     EventDate                             

121  MI         12/1/2010                           

121  MI         5/3/2012                            

                                                     

123  MI         3/04/2005                            

                                                     

144   MI        5/01/2012                            

144   MI        6/15/2012                            

Data set 2

  ID  Hospital RecordDate

121 BUR  01/31/2001
121 FOLK 04/30/2004
121 FOLK 04/30/2004
121 BUR  12/31/2010
121 BUR  12/31/2010
123 BUR  5/31/2005
144 BUR  5/01/2012
144 DES  6/30/2012

144 DES      6/30/2010

144 BUR      7/31/2012

                                                  

a

Thank you for your assistance.

Grand Advisor
Posts: 9,578

Re: Selecting unique records by two different variables

What is your output ?

New Contributor
Posts: 2

Re: Selecting unique records by two different variables

HI XIA,

My output should be  the most recent visit for patients who were at BUR Hospital.

Ihsan

Grand Advisor
Posts: 9,578

Re: Selecting unique records by two different variables

Yeah. But you'd better post the result you want , that could help us to understand what you are looking for .

data Dataset1;
infile cards expandtabs;
input ID   EVENT  $   EventDate : mmddyy10.;
format EventDate  mmddyy10.;
cards;                        
121  MI         12/1/2010                           
121  MI         5/3/2012                                                                              
123  MI         3/04/2005                                                                                
144   MI        5/01/2012                            
144   MI        6/15/2012
;
run;                            
 
data Dataset2;
infile cards expandtabs;
input ID  Hospital $ RecordDate : mmddyy10.;
format RecordDate  mmddyy10.;
cards;
121 BUR      01/31/2001
121 FOLK     04/30/2004
121 FOLK     04/30/2004
121 BUR      12/31/2010
121 BUR      12/31/2010
123 BUR      5/31/2005
144 BUR      5/01/2012
144 DES      6/30/2012
144 DES      6/30/2010
144 BUR      7/31/2012
;
run;

proc sort data=dataset1;by id EventDate;run;
proc sort data=dataset2;by id RecordDate ;run;
data temp;
 set dataset1(rename=(EVENT=dummy EventDate=date) in=ina) 
     dataset2(rename=(Hospital =dummy RecordDate =date) where=(dummy='BUR') in=inb);
 by id date;
 if id=lag(id) and lag(ina) and inb  ;
 run;
 data want;
  set temp;
  by id;
  if last.id;
  run;


Xia Keshan

Frequent Contributor
Posts: 144

Re: Selecting unique records by two different variables

You could do the cartesian product of the two dataset keeping only those that have the same ID.

proc sql;

create table join_data as (

select d1.*,d2.Hospital, d2.RecordDate

from data1 d1,  data d2

on d1.ID=d2.ID

);

quit;

Doing this you have the two dataset joined.

Once you have this, calculating time between record dates diff=Recorddate-EventDate, you could sort by ID and diff, you could keep if first.ID. Doing this you keep a record per ID and the less difference time, if you only want those that have EVENT after hospital you could select those doing a where before the first.id;

proc sort data=join_data ;

     by id diff;

run;

data join_data;

set join_data;

by id diff;

where diff>=0;

if first.id;

run;

Esteemed Advisor
Esteemed Advisor
Posts: 7,217

Re: Selecting unique records by two different variables

Another alternative.  Create a column for max() below the date in question and column for min() above the date in question and then result is one of those:

proc sql;

     create table WANT as

     select     A.*,

                    B.BELOW,

                    A.ABOVE

     from          HAVE A

     left join    (select ID,max(RECORD_DATE) as BELOW from HAVE where ID=A.ID and RECORD_DATE <= A.RECORD_DATE) B

     on             A.ID=B.ID

     left join    (select ID,min(RECORD_DATE) as ABOVE from HAVE where ID=A.ID and RECORD_DATE >= A.RECORD_DATE) C

     on               A.ID=B.ID;

quit;

Ask a Question
Discussion stats
  • 5 replies
  • 230 views
  • 0 likes
  • 4 in conversation