Help using Base SAS procedures

Proc SQL multiple scenario joins

Reply
Contributor
Posts: 61

Proc SQL multiple scenario joins

[ Edited ]

 

Hello-

 

I would really appreciate some tips and assistance.

I have been working on this for almost a week and cant seem to be able to write SQL code that covers all of the scenarios below.  (trying to merge between start and end dates)   I'm clueless on how to write the statement for Scenario 3 and Scenario 4.   Can anyone provide me with some pointers.   Thank you.

 

 

 

PROC SQL;

                CREATE TABLE NEW AS

                SELECT *

FROM ENR AS E LEFT JOIN PRO AS P

                ON E.IDKEY=P.IDKEY

AND (P.PGM_END BETWEEN E.ENR_START AND E.ENR_END)

OR

(P.PGM_START BETWEEN E.ENR_START AND E.ENR_END)

OR

 

 

Scenario 1:         

       |____enrollment_____|     

|___Program______|

 

 Scenario 2:

             

|____enrollment_____|     

       |___Program______|

 

 Scenario 3:

             

       |__enrollment__|    

|__________Program_________|

 

  Scenario 4:

             

|__________Enrollment_________|

   

       |__Program__|

 

 

Respected Advisor
Posts: 4,935

Re: Proc SQL multiple scenario joins

Posted in reply to Suzanne_Ed

The expression to cover all scenarios is:

 

PROC SQL;
CREATE TABLE NEW AS
SELECT *
FROM ENR AS E LEFT JOIN PRO AS P
ON E.IDKEY=P.IDKEY and
p.pgm_start <=e.enr_end and p.pgm_end >= e.enr_start;
quit;

 

PG
Contributor
Posts: 61

Re: Proc SQL multiple scenario joins

Hi PG- Thanks so much for looking at my data problem. This semi-worked. The issue is that there could be multiple rows of PGM records and the ENR record is being multiplied by the number of PGM records that qualify. In these instances, is it possible to set up a 'tie breaker'? Meaning have the ENR record only join to the most recent (descending) date of the PGM record? Thank you so much!
Respected Advisor
Posts: 4,935

Re: Proc SQL multiple scenario joins

Posted in reply to Suzanne_Ed

The left join makes this a bit tricky. Try :

 

PROC SQL;
CREATE TABLE NEW AS
SELECT *
FROM ENR AS E LEFT JOIN PRO AS P
ON E.IDKEY=P.IDKEY and
p.pgm_start <=e.enr_end and p.pgm_end >= e.ent_start
group by e.enr_start, e.enr_end
having p.pgm_start=max(p.pgm_start) or p.pgm_start is missing;
quit;
PG
Contributor
Posts: 61

Re: Proc SQL multiple scenario joins

Hi PG- 
Unfortunately it didn't work.  I was expecting one match for this particular IDKEY and the updated code resulted in zero matches whereas, your original code resulted in three matches. 

 

Here is an example with data.  All three of the records in the PGM dataset would qualify to join to the one ENR data record.  But the ENR data record I am trying to match to would be the last recrod in the PGM dataset. (the one with CERTKEY=1020)  Would you be able to incorporate the max(certkey) into the syntax to help select the most recent/decending?? (In my PGM table, the certkey is unique and the number increases as new records are added) See below:   Thank you so much!

 

ENR DATASET      
IDKEY ENR_START ENR_END DATA
50 AUG 11 2014 JUNE 4 2015 BSIELD
       
PGM DATASET      
IDKEY PGM_START PGM_END CERTKEY
50 AUG 11/2014 999999 836
50 AUG 11/2014 999999 820
50 MAY 29 2015 999999 1020
Respected Advisor
Posts: 4,173

Re: Proc SQL multiple scenario joins

Posted in reply to Suzanne_Ed

Please provide some representative sample data which covers all the cases and then tell us exactly how the result needs to look like. 

This will allow us to actually test proposed code.

Respected Advisor
Posts: 4,935

Re: Proc SQL multiple scenario joins

Posted in reply to Suzanne_Ed

My proposed code didn't work because a typo generated an ERROR. Now it works:

 


data enr;
input idkey (enr_start enr_end) (:date9.);
format enr_start enr_end yymmdd10.;
datalines;
50 11aug2014 04jun2015
;

data pro;
input idkey (pgm_start pgm_end) (:date9.);
format pgm_start pgm_end yymmdd10.;
datalines;
50 11aug2014 04jun2099
50 11aug2014 04jun2099
50 29may2015 04jun2099
;

PROC SQL;
CREATE TABLE NEW AS
SELECT e.*, p.pgm_start, pgm_end
FROM ENR AS E LEFT JOIN PRO AS P
ON E.IDKEY=P.IDKEY and
p.pgm_start <=e.enr_end and p.pgm_end >= e.enr_start
group by e.enr_start, e.enr_end
having p.pgm_start=max(p.pgm_start) or p.pgm_start is missing;
select * from new;
quit;
PG
Contributor
Posts: 61

Re: Proc SQL multiple scenario joins

Hi PG- Can this code work if I used SAS dates instead of pgm_start pgm_end enr_start enr_end? I tried but I couldn't get it to work- but it did work when working with actual dates mmddyy10. Also, do you know if the SQL code can produce an output dataset of non-matches? Or how would I go about searching for those ENR records that had zero matches in the PGM dataset?
Respected Advisor
Posts: 4,935

Re: Proc SQL multiple scenario joins

Posted in reply to Suzanne_Ed

The code is meant to work with SAS dates, whatever their format.  ENR records without matches will show up in the output dataset with missing pgm dates. That's the effect of your left join combined with the condition or p.pgm_start is missing in the HAVING clause.

PG
Contributor
Posts: 61

Re: Proc SQL multiple scenario joins

[ Edited ]

HI PG-
I used your syntax and I am not getting the results I am expecting. A no match resulted when a match should have been produced. Here is the code:
PROC SQL;
CREATE TABLE NEW AS
SELECT E.*, P.IDKEY, P.CERTKEY, P.BIRDATE, P.GNDRCODEKEY,
P.SAS_START_DATE, P.SAS_END_DATE,
FROM ENR AS E LEFT JOIN PGM AS P
ON E.IDKEY=P.IDKEY and
p.SAS_START_DATE LE e.SAS_WITHDRLDATE_DATE and p.SAS_END_DATE GE e.SAS_ENROLLDATE_DATE
group by e.SAS_ENROLLDATE_DATE, e.SAS_WITHDRLDATE_DATE
having p.SAS_START_DATE=max(p.SAS_START_DATE) or p.SAS_START_DATE is missing;
select * from new;
quit;

However, when I manually entered the records from the two table(expecting a match) into the code below- it resulted in a match.
data enr;
input idkey enr_start enr_end;
datalines;
50 19946 20243
; RUN;

data pro;
input idkey pgm_start pgm_end ;
datalines;
50 19946 9999999
50 19946 9999999
50 20237 9999999
;

PROC SQL;
CREATE TABLE NEW AS
SELECT e.*, p.pgm_start, pgm_end
FROM ENR AS E LEFT JOIN PRO AS P
ON E.IDKEY=P.IDKEY and
p.pgm_start <=e.enr_end and p.pgm_end >= e.enr_start
group by e.enr_start, e.enr_end
having p.pgm_start=max(p.pgm_start) or p.pgm_start is missing;
select * from new;
quit;

Do you know what could be wrong?

 

Also- the join didnt return all records from my original table.  Dataset 1 contains 114,000 records (which is the SOURCE) Dataset 2 contains 354,000 and join only netted 33,000 records? 

Respected Advisor
Posts: 4,935

Re: Proc SQL multiple scenario joins

[ Edited ]
Posted in reply to Suzanne_Ed

 

The join condition should be OR, not AND

 

ON E.IDKEY=P.IDKEY and
(p.SAS_START_DATE LE e.SAS_WITHDRLDATE_DATE or p.SAS_END_DATE GE e.SAS_ENROLLDATE_DATE)

PG
Respected Advisor
Posts: 4,935

Re: Proc SQL multiple scenario joins

Posted in reply to Suzanne_Ed

Just to make sure, you should add:

 

group by E.IDKEY, e.SAS_ENROLLDATE_DATE, e.SAS_WITHDRLDATE_DATE

PG
Contributor
Posts: 61

Re: Proc SQL multiple scenario joins

Thanks PG. Its been running for over almost two hours but I am hoping it will be successful.
Contributor
Posts: 61

Re: Proc SQL multiple scenario joins

PG- I get an ERROR: Sort execution failure. Would you know how I can fix this?
Respected Advisor
Posts: 4,935

Re: Proc SQL multiple scenario joins

Posted in reply to Suzanne_Ed

This is really out of my league. I would guess that there wasn't enough disk space. I suggest you post a new topic with the full query and the error message.

PG
Ask a Question
Discussion stats
  • 15 replies
  • 439 views
  • 0 likes
  • 4 in conversation