topic Re: Proc sql left join no duplicates in SAS Programming

Proc sql left join no duplicates

nickspencer — Fri, 12 Feb 2021 21:06:01 GMT

Hi all,
I am trying to create an indicator on table1 based on information on table2.
Table1
Patient_id comp_id date
123 A111 1/2/2021
234 A124 1/3/2021
567 A234 1/4/2021

Table2
Patient_id Comp_id Date State
123 A111 1/5/2020 AZ
123 A111 1/4/2020 AZ
123 A111 2/5/2021 PA
234 A124 1/3/2021 NY
So, I want to create a column called indicator in table1 and gave indicator value of 1 if the same patient and company is present in table2 and date in table1 is less than any date for that condition in table2 and state is AZ. That means only first row in table1 will have indicator of 1. I tried few ways but got duplicate in some cases. What may be the best way to achieve this with proc sql?

Re: Proc sql left join no duplicates

mkeintz — Fri, 12 Feb 2021 21:31:34 GMT

I think this is pretty simple in a data step. Make sure table1 and table2 are sorted by PATIENT_ID/COMP_ID. Then:

data want;
  do until (last.comp_id);
    merge table1 table2 (rename=(date=date2));
    by patient_id comp_id;
    if state='AZ' and date2<date then indicator=1;
  end;
  drop date2 state;
run;

This assumes there is only one obs per PATIENT_ID/COMP_ID in table 1.

Re: Proc sql left join no duplicates

nickspencer — Fri, 12 Feb 2021 21:36:24 GMT

@mkeintz this is great and there is only one patiend_id/comp_id. But is there a way to achieve the same in proc sql?

Re: Proc sql left join no duplicates

Tom — Fri, 12 Feb 2021 21:59:51 GMT

What do you mean by duplicate? The way in SQL to avoid duplicate is either DISTINCT keyword or GROUP BY.

data one;
  input patient_id $ comp_id $ date :mmddyy.;
  format date yymmdd10.;
cards;
123 A111 1/2/2021
234 A124 1/3/2021
567 A234 1/4/2021

data two;
  input patient_id $ comp_id $ date :mmddyy. state $ ;
  format date yymmdd10.;
cards;
123 A111 1/5/2020 AZ
123 A111 1/4/2020 AZ
123 A111 2/5/2021 PA
234 A124 1/3/2021 NY
;

proc sql ;
create table want as
  select a.patient_id, a.comp_id
       , max(b.state='AZ' ) as any_AZ
  from one a 
  left join two b
    on a.patient_id = b.patient_id
    and a.comp_id = b.comp_id 
    and a.date > b.date
  group by 1,2
;
quit;

proc print data=want;
run;

       patient_
Obs       id       comp_id    any_AZ

 1       123        A111         1
 2       234        A124         0
 3       567        A234         0

Re: Proc sql left join no duplicates

nickspencer — Fri, 12 Feb 2021 22:10:13 GMT

@Tom this works great except that I have number of other columns in table1 to group by along with state column(may have different value). Is there a way to handle that?

Re: Proc sql left join no duplicates

Tom — Sat, 13 Feb 2021 00:03:21 GMT

I have no idea what you mean. Please show a clearer example with a least a couple of these other "columns" (did you mean input variables? output variables? levels of one or more of the input variables? something else?)

Re: Proc sql left join no duplicates

Ksharp — Sat, 13 Feb 2021 11:57:21 GMT

data one;
  input patient_id $ comp_id $ date :mmddyy.;
  format date yymmdd10.;
cards;
123 A111 1/2/2021
234 A124 1/3/2021
567 A234 1/4/2021

data two;
  input patient_id $ comp_id $ date :mmddyy. state $ ;
  format date yymmdd10.;
cards;
123 A111 1/5/2020 AZ
123 A111 1/4/2020 AZ
123 A111 2/5/2021 PA
234 A124 1/3/2021 NY
;

proc sql ;
create table want as
  select a.*,
   exists(select * from two where patient_id=a.patient_id and comp_id=a.comp_id and state='AZ' and date<a.date) as flag
  from one a 
;
quit;

Re: Proc sql left join no duplicates

ballardw — Sat, 13 Feb 2021 16:32:32 GMT

@nickspencer wrote:
Hi all,
I am trying to create an indicator on table1 based on information on table2.
Table1
Patient_id comp_id date
123 A111 1/2/2021
234 A124 1/3/2021
567 A234 1/4/2021

Table2
Patient_id Comp_id Date State
123 A111 1/5/2020 AZ
123 A111 1/4/2020 AZ
123 A111 2/5/2021 PA
234 A124 1/3/2021 NY
So, I want to create a column called indicator in table1 and gave indicator value of 1 if the same patient and company is present in table2 and date in table1 is less than any date for that condition in table2 and state is AZ. That means only first row in table1 will have indicator of 1. I tried few ways but got duplicate in some cases. What may be the best way to achieve this with proc sql?

LARGE ECONOMY SIZED HINT: If you want someone to improve on your code provide the code.

Re: Proc sql left join no duplicates

ChrisNZ — Sun, 14 Feb 2021 22:48:33 GMT

Why use SQL when a data step implements the required logic so cleanly?