## 1:Many Data merge logic based on Date

Solved
Frequent Contributor
Posts: 81

# 1:Many Data merge logic based on Date

Hi,

I have a data set of transactions that I would like to look up against a Reference table.

Transaction data;

ID Trans_Date Prod_Group Qty

1 08Jan2014 1 5

2 08Feb2014 2 9

3 09Mar2014 3 2

...

Reference Table

Prod_Group Price Effective_Date Change_Date

1 1 \$25 01Jan2014 05Feb2014

1 \$22 05Feb2014 .

2 \$30 01Jan2014 10Jan2014

2 \$25 10Jan2014 .

3 \$25 01Jan2014 .

The reference table generates a new record once a price change occurs. The Change_Date is recorded to the original price record and the Change_Date becomes the Effective_Date of the new record and the Change_Date of the new record is Null.

I'd like to pull the Price of the transaction as of the Transaction Date.

Final Table:

ID Trans_Date Prod_Group Qty Price

1 08Jan2014 1 5 \$25

2 08Jan2014 2 9 \$30

3 09Mar2014 3 2 \$25

Thank you in advance for the help!

Accepted Solutions
Solution
‎04-14-2015 02:06 PM
Super User
Posts: 6,753

## Re: 1:Many Data merge logic based on Date

PROD_GROUP can be character ... no problem.

HLO="H" indicates that the END value should be ignored and should be treated as if the format defined the range:

"some_value" - high = "some_price"

That would be appropriate for your data, since missing end date means that the price is in effect no matter how high the date gets.

The range overlap likely stems from needing to make a decision about the overlapping dates.  For example, for PROD_GROUP 2, is 10Jan2014 exactly included in the \$25 or the \$30 category?  There is a single variable in CNTLIN= data sets that would need to be changed to handle this, although I don't recall the variable name off the top of my head.  But there is one variable that indicates whether the START point exactly should be included or included in the range definition, and another variable that indicates whether the END point should be included or excluded.  It's probably better to set both of those rather than rely on default values.

If that doesn't solve the overlapping range problems, then you have worse problems with your data ... conflicting date range definitions that have to be solved before starting to program (or even extra PROD_GROUP entries).  But the first step is to decide on whether the START and END dates are included or excluded in the range definitions.

Second thoughts ... the overlapping ranges need more work.  There can't be multiple ranges that use HLO='H'.  The format would have to be adjusted to eliminate the use of HLO and replace the logic with a range that defines up until the current date.  I can't see the variable names while I'm posting, but the idea would be along these lines:

if change_date = . then end = catx(' ', prod_group, "&sysdate9"d);

All Replies
Contributor
Posts: 39

## Re: 1:Many Data merge logic based on Date

Hello,

You can do the the group by on prod_group and with condition where change_date is missing, you will have all the recent records and then do the join.

Thanks,

Mike

Super User
Posts: 6,753

## Re: 1:Many Data merge logic based on Date

Assuming that your dates are actual SAS dates, I would create a format based on the second table:

data fmt;

set reference_table;

fmtname='\$pr_date';

start = catx(' ', prod_group, effective_date);

if change_date > . then end = catx(' ', prod_group, change_date);

else hlo='H';

label=price;

run;

proc format cntlin=fmt;

run;

It translates the combination of PROD_GROUP plus a range of dates into the price.  Then utilize the format on  your transactions:

data want;

set transaction_data;

price = put(catx(' ', prod_group, trans_date), \$pr_date.);

run;

You might want to tweak the results depending on whether PRICE should be character or numeric.

Good luck.

Frequent Contributor
Posts: 81

## Re: 1:Many Data merge logic based on Date

I've tried the code, but am getting a 'Range overlap' error. I should probably have mentioned the Prod_Group is not a numeric if that matters.

What does the hlo='H'; do?

Solution
‎04-14-2015 02:06 PM
Super User
Posts: 6,753

## Re: 1:Many Data merge logic based on Date

PROD_GROUP can be character ... no problem.

HLO="H" indicates that the END value should be ignored and should be treated as if the format defined the range:

"some_value" - high = "some_price"

That would be appropriate for your data, since missing end date means that the price is in effect no matter how high the date gets.

The range overlap likely stems from needing to make a decision about the overlapping dates.  For example, for PROD_GROUP 2, is 10Jan2014 exactly included in the \$25 or the \$30 category?  There is a single variable in CNTLIN= data sets that would need to be changed to handle this, although I don't recall the variable name off the top of my head.  But there is one variable that indicates whether the START point exactly should be included or included in the range definition, and another variable that indicates whether the END point should be included or excluded.  It's probably better to set both of those rather than rely on default values.

If that doesn't solve the overlapping range problems, then you have worse problems with your data ... conflicting date range definitions that have to be solved before starting to program (or even extra PROD_GROUP entries).  But the first step is to decide on whether the START and END dates are included or excluded in the range definitions.

Second thoughts ... the overlapping ranges need more work.  There can't be multiple ranges that use HLO='H'.  The format would have to be adjusted to eliminate the use of HLO and replace the logic with a range that defines up until the current date.  I can't see the variable names while I'm posting, but the idea would be along these lines:

if change_date = . then end = catx(' ', prod_group, "&sysdate9"d);

🔒 This topic is solved and locked.