Solved: Re: Proc sql left join no duplicates

nickspencer · Posted 02-12-2021 04:06 PM

Hi all,
I am trying to create an indicator on table1 based on information on table2.
Table1
Patient_id comp_id date
123 A111 1/2/2021
234 A124 1/3/2021
567 A234 1/4/2021

Table2
Patient_id Comp_id Date State
123 A111 1/5/2020 AZ
123 A111 1/4/2020 AZ
123 A111 2/5/2021 PA
234 A124 1/3/2021 NY
So, I want to create a column called indicator in table1 and gave indicator value of 1 if the same patient and company is present in table2 and date in table1 is less than any date for that condition in table2 and state is AZ. That means only first row in table1 will have indicator of 1. I tried few ways but got duplicate in some cases. What may be the best way to achieve this with proc sql?

Ksharp · Posted 02-13-2021 06:57 AM

data one;
  input patient_id $ comp_id $ date :mmddyy.;
  format date yymmdd10.;
cards;
123 A111 1/2/2021
234 A124 1/3/2021
567 A234 1/4/2021

data two;
  input patient_id $ comp_id $ date :mmddyy. state $ ;
  format date yymmdd10.;
cards;
123 A111 1/5/2020 AZ
123 A111 1/4/2020 AZ
123 A111 2/5/2021 PA
234 A124 1/3/2021 NY
;

proc sql ;
create table want as
  select a.*,
   exists(select * from two where patient_id=a.patient_id and comp_id=a.comp_id and state='AZ' and date<a.date) as flag
  from one a 
;
quit;

View solution in original post

mkeintz · Posted 02-12-2021 04:31 PM

I think this is pretty simple in a data step. Make sure table1 and table2 are sorted by PATIENT_ID/COMP_ID. Then:

data want;
  do until (last.comp_id);
    merge table1 table2 (rename=(date=date2));
    by patient_id comp_id;
    if state='AZ' and date2<date then indicator=1;
  end;
  drop date2 state;
run;

This assumes there is only one obs per PATIENT_ID/COMP_ID in table 1.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

nickspencer · Posted 02-12-2021 04:36 PM

@mkeintz this is great and there is only one patiend_id/comp_id. But is there a way to achieve the same in proc sql?

ChrisNZ · Posted 02-14-2021 05:48 PM

Why use SQL when a data step implements the required logic so cleanly?

High-Performance SAS Coding - Third Edition

Tom · Posted 02-12-2021 04:59 PM

What do you mean by duplicate? The way in SQL to avoid duplicate is either DISTINCT keyword or GROUP BY.

data one;
  input patient_id $ comp_id $ date :mmddyy.;
  format date yymmdd10.;
cards;
123 A111 1/2/2021
234 A124 1/3/2021
567 A234 1/4/2021

data two;
  input patient_id $ comp_id $ date :mmddyy. state $ ;
  format date yymmdd10.;
cards;
123 A111 1/5/2020 AZ
123 A111 1/4/2020 AZ
123 A111 2/5/2021 PA
234 A124 1/3/2021 NY
;

proc sql ;
create table want as
  select a.patient_id, a.comp_id
       , max(b.state='AZ' ) as any_AZ
  from one a 
  left join two b
    on a.patient_id = b.patient_id
    and a.comp_id = b.comp_id 
    and a.date > b.date
  group by 1,2
;
quit;

proc print data=want;
run;

       patient_
Obs       id       comp_id    any_AZ

 1       123        A111         1
 2       234        A124         0
 3       567        A234         0

nickspencer · Posted 02-12-2021 05:10 PM

@Tom this works great except that I have number of other columns in table1 to group by along with state column(may have different value). Is there a way to handle that?

Tom · Posted 02-12-2021 07:03 PM

I have no idea what you mean. Please show a clearer example with a least a couple of these other "columns" (did you mean input variables? output variables? levels of one or more of the input variables? something else?)

Ksharp · Posted 02-13-2021 06:57 AM

data one;
  input patient_id $ comp_id $ date :mmddyy.;
  format date yymmdd10.;
cards;
123 A111 1/2/2021
234 A124 1/3/2021
567 A234 1/4/2021

data two;
  input patient_id $ comp_id $ date :mmddyy. state $ ;
  format date yymmdd10.;
cards;
123 A111 1/5/2020 AZ
123 A111 1/4/2020 AZ
123 A111 2/5/2021 PA
234 A124 1/3/2021 NY
;

proc sql ;
create table want as
  select a.*,
   exists(select * from two where patient_id=a.patient_id and comp_id=a.comp_id and state='AZ' and date<a.date) as flag
  from one a 
;
quit;

ballardw · Posted 02-13-2021 11:32 AM

@nickspencer wrote:
Hi all,
I am trying to create an indicator on table1 based on information on table2.
Table1
Patient_id comp_id date
123 A111 1/2/2021
234 A124 1/3/2021
567 A234 1/4/2021

Table2
Patient_id Comp_id Date State
123 A111 1/5/2020 AZ
123 A111 1/4/2020 AZ
123 A111 2/5/2021 PA
234 A124 1/3/2021 NY
So, I want to create a column called indicator in table1 and gave indicator value of 1 if the same patient and company is present in table2 and date in table1 is less than any date for that condition in table2 and state is AZ. That means only first row in table1 will have indicator of 1. I tried few ways but got duplicate in some cases. What may be the best way to achieve this with proc sql?

LARGE ECONOMY SIZED HINT: If you want someone to improve on your code provide the code.

Register Today!

SAS Training: Just a Click Away