Hello,
I have a question on how to join two tables but with conditions. I am using SAS 9.4. I have one table that has about 4400 records of clients these are duplicates Client ID with Start dates (see table 1 below). I have another table, about 233 rows, that has client_IDs begin program dates, end program dates and program names (see table 2), the client ID in table 2 are the same as in table 1 but they tell me which prgram a cient is enrolled in a certain time period. What I would like to do is to populate Table 1 with the program names where the start_date in table 1 is between or equal to the being and end program dates in table 2. So the output looks like table 3 below.
I used the following left join SQL, but I ended up with about 500 extra rows. Why was that? There are cases where items in table 2 dont match table 1, and if that is the case then I do not want that in table 3. Is that where the extra 500 cases come from?
PROC SQL;
CREATE TABLE want_table3 AS
SELECT D.*, B.Program_name, B.Begin_date, B.end_date
FROM table1 D left join table2 B
on D.client_ID = B.client_ID
and b.Begin_date <=d.start_date<= b.end_date;
RUN;
QUIT;
Table 1: Original Table | |
Client_ID | Start_Date |
1 | 1/1/2015 |
1 | 1/2/2015 |
1 | 1/3/2015 |
1 | 2/3/2015 |
1 | 2/4/2015 |
1 | 2/5/2015 |
2 | 1/2/2015 |
2 | 1/3/2015 |
2 | 1/4/2015 |
2 | 2/5/2015 |
2 | 2/6/2015 |
2 | 2/7/2015 |
3 | 3/1/2015 |
3 | 3/2/2015 |
3 | 4/5/2015 |
3 | 4/6/2015 |
Table 2: Referance Table | |||
Client | Begin_date | End_date | Program_name |
1 | 4/1/2013 | 1/31/2015 | Inpatient |
1 | 2/1/2015 | 12/31/9999 | Oupatient |
2 | 1/1/2015 | 2/28/2015 | Inpatient |
2 | 3/1/2015 | 4/1/2015 | Oupatient |
2 | 4/2/2015 | 12/31/9999 | Hospital |
3 | 5/5/2013 | 2/28/2015 | Oupatient |
Table 3: Final Table | ||
Client_ID | Start_Date | Program_name |
1 | 1/1/2015 | Inpatient |
1 | 1/2/2015 | Inpatient |
1 | 1/3/2015 | Inpatient |
1 | 2/3/2015 | Outpatient |
1 | 2/4/2015 | Outpatient |
1 | 2/5/2015 | Outpatient |
2 | 1/2/2015 | Inpatient |
2 | 1/3/2015 | Inpatient |
2 | 1/4/2015 | Inpatient |
2 | 2/5/2015 | Inpatient |
2 | 2/6/2015 | Inpatient |
2 | 2/7/2015 | Inpatient |
3 | 3/1/2015 | Outpatient |
3 | 3/2/2015 | Outpatient |
3 | 4/5/2015 | Hospital |
3 | 4/6/2015 | Hospital |
Than you for your help in advance
Thank you for your reply.
But I think I figured out what the problem was.. I did not have a "distinct" after my select statement so...
PROC SQL;
CREATE TABLE want_table3 AS
SELECT DISTINCT D.*, B.Program_name, B.Begin_date, B.end_date
FROM table1 D left join table2 B
on D.client_ID = B.client_ID
and b.Begin_date <=d.start_date<= b.end_date;
RUN;
QUIT;
by adding the distinct I got the same number of rows as I was expecting.
Thank you again
Please expand your sample data to include some of the issues mentioned. It looks like a clean sample right now.
Thank you for your reply. my updated table samples are below.. Notice Client 4 is not in table 2 and that is okay.. I would still like to bring client 4 to my final table.
Also notice I added client 5 to table 2 (and one of the cells is purposely left blank), since client 5 is not in table 1 I do not want that client info in my final table.
Table 1: Original Table | |
Client_ID | Start_Date |
1 | 1/1/2015 |
1 | 1/2/2015 |
1 | 1/3/2015 |
1 | 2/3/2015 |
1 | 2/4/2015 |
1 | 2/5/2015 |
2 | 1/2/2015 |
2 | 1/3/2015 |
2 | 1/4/2015 |
2 | 2/5/2015 |
2 | 2/6/2015 |
2 | 2/7/2015 |
3 | 3/1/2015 |
3 | 3/2/2015 |
3 | 4/5/2015 |
3 | 4/6/2015 |
4 | 5/1/2015 |
4 | 5/4/2016 |
4 | 5/6/2015 |
Client | Begin_date | End_date | Program_name |
1 | 4/1/2013 | 1/31/2015 | Inpatient |
1 | 2/1/2015 | 12/31/9999 | Oupatient |
2 | 1/1/2015 | 2/28/2015 | Inpatient |
2 | 3/1/2015 | 4/1/2015 | Oupatient |
2 | 4/2/2015 | 12/31/9999 | Hospital |
3 | 5/5/2013 | 2/28/2015 | Oupatient |
5 | 6/7/2013 | 5/5/2015 | |
5 | 5/6/2015 | 12/31/9999 | Hospital |
Client_ID | Start_Date | Program_name |
1 | 1/1/2015 | Inpatient |
1 | 1/2/2015 | Inpatient |
1 | 1/3/2015 | Inpatient |
1 | 2/3/2015 | Outpatient |
1 | 2/4/2015 | Outpatient |
1 | 2/5/2015 | Outpatient |
2 | 1/2/2015 | Inpatient |
2 | 1/3/2015 | Inpatient |
2 | 1/4/2015 | Inpatient |
2 | 2/5/2015 | Inpatient |
2 | 2/6/2015 | Inpatient |
2 | 2/7/2015 | Inpatient |
3 | 3/1/2015 | Outpatient |
3 | 3/2/2015 | Outpatient |
3 | 4/5/2015 | Hospital |
3 | 4/6/2015 | Hospital |
4 | 5/1/2015 | NoREF |
4 | 5/4/2016 | NoREF |
4 | 5/6/2015 | NoREF |
Hope this clarifies. Thank you again for your help in advance.
Appreciate it
There's nothing wrong with the code you've shown. You're getting 'extra' records either because your condition is not unique, ie the between matches multiple ranges in table 2?
You should pull up a few rows and see what they look like. Post some samples if you can.
Thank you for your reply.
But I think I figured out what the problem was.. I did not have a "distinct" after my select statement so...
PROC SQL;
CREATE TABLE want_table3 AS
SELECT DISTINCT D.*, B.Program_name, B.Begin_date, B.end_date
FROM table1 D left join table2 B
on D.client_ID = B.client_ID
and b.Begin_date <=d.start_date<= b.end_date;
RUN;
QUIT;
by adding the distinct I got the same number of rows as I was expecting.
Thank you again
You must have duplicates in one of your source tables. Distinct shouldn't be required.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.