BookmarkSubscribeRSS Feed
SannaSanna
Quartz | Level 8

 

Hello-

 

I would really appreciate some tips and assistance.

I have been working on this for almost a week and cant seem to be able to write SQL code that covers all of the scenarios below.  (trying to merge between start and end dates)   I'm clueless on how to write the statement for Scenario 3 and Scenario 4.   Can anyone provide me with some pointers.   Thank you.

 

 

 

PROC SQL;

                CREATE TABLE NEW AS

                SELECT *

FROM ENR AS E LEFT JOIN PRO AS P

                ON E.IDKEY=P.IDKEY

AND (P.PGM_END BETWEEN E.ENR_START AND E.ENR_END)

OR

(P.PGM_START BETWEEN E.ENR_START AND E.ENR_END)

OR

 

 

Scenario 1:         

       |____enrollment_____|     

|___Program______|

 

 Scenario 2:

             

|____enrollment_____|     

       |___Program______|

 

 Scenario 3:

             

       |__enrollment__|    

|__________Program_________|

 

  Scenario 4:

             

|__________Enrollment_________|

   

       |__Program__|

 

 

15 REPLIES 15
PGStats
Opal | Level 21

The expression to cover all scenarios is:

 

PROC SQL;
CREATE TABLE NEW AS
SELECT *
FROM ENR AS E LEFT JOIN PRO AS P
ON E.IDKEY=P.IDKEY and
p.pgm_start <=e.enr_end and p.pgm_end >= e.enr_start;
quit;

 

PG
SannaSanna
Quartz | Level 8
Hi PG- Thanks so much for looking at my data problem. This semi-worked. The issue is that there could be multiple rows of PGM records and the ENR record is being multiplied by the number of PGM records that qualify. In these instances, is it possible to set up a 'tie breaker'? Meaning have the ENR record only join to the most recent (descending) date of the PGM record? Thank you so much!
PGStats
Opal | Level 21

The left join makes this a bit tricky. Try :

 

PROC SQL;
CREATE TABLE NEW AS
SELECT *
FROM ENR AS E LEFT JOIN PRO AS P
ON E.IDKEY=P.IDKEY and
p.pgm_start <=e.enr_end and p.pgm_end >= e.ent_start
group by e.enr_start, e.enr_end
having p.pgm_start=max(p.pgm_start) or p.pgm_start is missing;
quit;
PG
SannaSanna
Quartz | Level 8

Hi PG- 
Unfortunately it didn't work.  I was expecting one match for this particular IDKEY and the updated code resulted in zero matches whereas, your original code resulted in three matches. 

 

Here is an example with data.  All three of the records in the PGM dataset would qualify to join to the one ENR data record.  But the ENR data record I am trying to match to would be the last recrod in the PGM dataset. (the one with CERTKEY=1020)  Would you be able to incorporate the max(certkey) into the syntax to help select the most recent/decending?? (In my PGM table, the certkey is unique and the number increases as new records are added) See below:   Thank you so much!

 

ENR DATASET      
IDKEY ENR_START ENR_END DATA
50 AUG 11 2014 JUNE 4 2015 BSIELD
       
PGM DATASET      
IDKEY PGM_START PGM_END CERTKEY
50 AUG 11/2014 999999 836
50 AUG 11/2014 999999 820
50 MAY 29 2015 999999 1020
Patrick
Opal | Level 21

Please provide some representative sample data which covers all the cases and then tell us exactly how the result needs to look like. 

This will allow us to actually test proposed code.

PGStats
Opal | Level 21

My proposed code didn't work because a typo generated an ERROR. Now it works:

 


data enr;
input idkey (enr_start enr_end) (:date9.);
format enr_start enr_end yymmdd10.;
datalines;
50 11aug2014 04jun2015
;

data pro;
input idkey (pgm_start pgm_end) (:date9.);
format pgm_start pgm_end yymmdd10.;
datalines;
50 11aug2014 04jun2099
50 11aug2014 04jun2099
50 29may2015 04jun2099
;

PROC SQL;
CREATE TABLE NEW AS
SELECT e.*, p.pgm_start, pgm_end
FROM ENR AS E LEFT JOIN PRO AS P
ON E.IDKEY=P.IDKEY and
p.pgm_start <=e.enr_end and p.pgm_end >= e.enr_start
group by e.enr_start, e.enr_end
having p.pgm_start=max(p.pgm_start) or p.pgm_start is missing;
select * from new;
quit;
PG
SannaSanna
Quartz | Level 8
Hi PG- Can this code work if I used SAS dates instead of pgm_start pgm_end enr_start enr_end? I tried but I couldn't get it to work- but it did work when working with actual dates mmddyy10. Also, do you know if the SQL code can produce an output dataset of non-matches? Or how would I go about searching for those ENR records that had zero matches in the PGM dataset?
PGStats
Opal | Level 21

The code is meant to work with SAS dates, whatever their format.  ENR records without matches will show up in the output dataset with missing pgm dates. That's the effect of your left join combined with the condition or p.pgm_start is missing in the HAVING clause.

PG
SannaSanna
Quartz | Level 8

HI PG-
I used your syntax and I am not getting the results I am expecting. A no match resulted when a match should have been produced. Here is the code:
PROC SQL;
CREATE TABLE NEW AS
SELECT E.*, P.IDKEY, P.CERTKEY, P.BIRDATE, P.GNDRCODEKEY,
P.SAS_START_DATE, P.SAS_END_DATE,
FROM ENR AS E LEFT JOIN PGM AS P
ON E.IDKEY=P.IDKEY and
p.SAS_START_DATE LE e.SAS_WITHDRLDATE_DATE and p.SAS_END_DATE GE e.SAS_ENROLLDATE_DATE
group by e.SAS_ENROLLDATE_DATE, e.SAS_WITHDRLDATE_DATE
having p.SAS_START_DATE=max(p.SAS_START_DATE) or p.SAS_START_DATE is missing;
select * from new;
quit;

However, when I manually entered the records from the two table(expecting a match) into the code below- it resulted in a match.
data enr;
input idkey enr_start enr_end;
datalines;
50 19946 20243
; RUN;

data pro;
input idkey pgm_start pgm_end ;
datalines;
50 19946 9999999
50 19946 9999999
50 20237 9999999
;

PROC SQL;
CREATE TABLE NEW AS
SELECT e.*, p.pgm_start, pgm_end
FROM ENR AS E LEFT JOIN PRO AS P
ON E.IDKEY=P.IDKEY and
p.pgm_start <=e.enr_end and p.pgm_end >= e.enr_start
group by e.enr_start, e.enr_end
having p.pgm_start=max(p.pgm_start) or p.pgm_start is missing;
select * from new;
quit;

Do you know what could be wrong?

 

Also- the join didnt return all records from my original table.  Dataset 1 contains 114,000 records (which is the SOURCE) Dataset 2 contains 354,000 and join only netted 33,000 records? 

PGStats
Opal | Level 21

 

The join condition should be OR, not AND

 

ON E.IDKEY=P.IDKEY and
(p.SAS_START_DATE LE e.SAS_WITHDRLDATE_DATE or p.SAS_END_DATE GE e.SAS_ENROLLDATE_DATE)

PG
PGStats
Opal | Level 21

Just to make sure, you should add:

 

group by E.IDKEY, e.SAS_ENROLLDATE_DATE, e.SAS_WITHDRLDATE_DATE

PG
SannaSanna
Quartz | Level 8
Thanks PG. Its been running for over almost two hours but I am hoping it will be successful.
SannaSanna
Quartz | Level 8
PG- I get an ERROR: Sort execution failure. Would you know how I can fix this?
PGStats
Opal | Level 21

This is really out of my league. I would guess that there wasn't enough disk space. I suggest you post a new topic with the full query and the error message.

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 3367 views
  • 0 likes
  • 4 in conversation