Hi All,
I've got a bit of a strange data situation that I'm hoping someone can help with....
Basically I have 2 datasets (A and B) with different unique identifiers that I am trying to join together via a 'lookup table' (C).
The problem is that this joining this A->B->C requires a many->one->many merge that I am struggling to find my way around.
Hopefully the example below will show my problem.....
Table A | ||||
obs_no | sale_dt | comp_dt | ||
Table B | ||||
obs_no | prop_no | |||
Table C | ||||
prop_no | move_dt | |||
Obs_no | prop_no | sale_dt | comp_dt | move_dt |
1 | 12 | 01/11/2017 | 01/12/2017 | 25/11/2017 |
1 | 12 | 01/03/2018 | 01/04/2018 | 14/03/2018 |
1 | 12 | 01/11/2017 | 01/12/2017 | 14/03/2018 |
1 | 12 | 01/03/2018 | 01/04/2018 | 25/11/2017 |
This is a very simplified version of what I am dealing with, but in essence I want the maximum move date for each observation/property number combination where the move_dt is before the comp_dt
Any advice will be much appreciated.
Thank you
You should provide examples of the input data sets and then the result of using those input sets. Without the actual start condition it is hard to interpret the words you describe and the color highlighting doesn't make any sense.
You don't need to provide a lot of records in the input sets but there should be enough to demonstrates what happens with any of the types of matches you need.
It looks more like you are trying to join tables A and C via table B which has fields in common with both tables (A and C). As @ballardw said, a little example would help clarify things for us.
Maybe some similar to this:
proc sql create table want as select a.obs_no,d.prop_no,a.sales_dt, a.comp_dt,d.move_dt from tablea as a left join (select tableb.obs_no, tablec.prop_no, tablec.move_dt from tableb left join tablec on tableb.prop_no=tablec.propno) as d on a.obs_no=d.obsno ; quit;
Apologies for not making that clearing.
Table A contains Sales data and Table C contains application data.
Sales are done at account level but can have multiple dates for the subaccounts within. The unique identifier in this table is the sales number.
Applications are done at property level and can also have multiple dates for the subaccounts. The unique identifier in this table is the property number.
Table B contains information at account level and has the account number and property number.
Trying to left/inner join the tables via sales number and property number if there are 2 observations yields 4 results as it generates all possible combinations of the observations. I can't show the actual data but a worked through example is below. For each sales date, I want the move date that is between the sales and completion dates.
Table A | ||||
Sales_no | sale_dt | comp_dt | ||
1 | 01/11/2017 | 01/12/2017 | ||
1 | 01/03/2018 | 01/04/2018 | ||
Table B | ||||
Prop_no | move_dt | |||
12 | 25/11/2017 | |||
12 | 14/03/2018 | |||
Result | ||||
Sales_no | Prop_no | sale_dt | comp_dt | move_dt |
1 | 12 | 01/11/2017 | 01/12/2017 | 25/11/2017 |
1 | 12 | 01/03/2018 | 01/04/2018 | 14/03/2018 |
1 | 12 | 01/11/2017 | 01/12/2017 | 14/03/2018 |
1 | 12 | 01/03/2018 | 01/04/2018 | 25/11/2017 |
Wanted | ||||
Sales_no | Prop_no | sale_dt | comp_dt | move_dt |
1 | 12 | 01/11/2017 | 01/12/2017 | 25/11/2017 |
1 | 12 | 01/03/2018 | 01/04/2018 | 14/03/2018 |
If both tables have the same number of observations, you can do:
data wanted;
merge table_a table_b;
/* No BY-statement!! */
run;
@bethsmith wrote:
Apologies for not making that clearing.
Table A contains Sales data and Table C contains application data.
Sales are done at account level but can have multiple dates for the subaccounts within. The unique identifier in this table is the sales number.
Applications are done at property level and can also have multiple dates for the subaccounts. The unique identifier in this table is the property number.
Table B contains information at account level and has the account number and property number.
Trying to left/inner join the tables via sales number and property number if there are 2 observations yields 4 results as it generates all possible combinations of the observations. I can't show the actual data but a worked through example is below. For each sales date, I want the move date that is between the sales and completion dates.
Table A Sales_no sale_dt comp_dt 1 01/11/2017 01/12/2017 1 01/03/2018 01/04/2018 Table B Prop_no move_dt 12 25/11/2017 12 14/03/2018 Result Sales_no Prop_no sale_dt comp_dt move_dt 1 12 01/11/2017 01/12/2017 25/11/2017 1 12 01/03/2018 01/04/2018 14/03/2018 1 12 01/11/2017 01/12/2017 14/03/2018 1 12 01/03/2018 01/04/2018 25/11/2017 Wanted Sales_no Prop_no sale_dt comp_dt move_dt 1 12 01/11/2017 01/12/2017 25/11/2017 1 12 01/03/2018 01/04/2018 14/03/2018
So what is the rule that says this one goes in the output:
1 | 12 | 01/11/2017 | 01/12/2017 | 25/11/2017 |
but not this one:
1 | 12 | 01/03/2018 | 01/04/2018 | 25/11/2017 |
You need to be able to state why certain combinations are wanted in the output and others excluded.
As the moving date cannot be before a sale has been made and it needs to be before the entire process is completed.
And picking out that specific combination is what I'm really struggling with.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.