Hi,
I have a data set of transactions that I would like to look up against a Reference table.
Transaction data;
ID Trans_Date Prod_Group Qty
1 08Jan2014 1 5
2 08Feb2014 2 9
3 09Mar2014 3 2
...
Reference Table
Prod_Group Price Effective_Date Change_Date
1 1 $25 01Jan2014 05Feb2014
1 $22 05Feb2014 .
2 $30 01Jan2014 10Jan2014
2 $25 10Jan2014 .
3 $25 01Jan2014 .
The reference table generates a new record once a price change occurs. The Change_Date is recorded to the original price record and the Change_Date becomes the Effective_Date of the new record and the Change_Date of the new record is Null.
I'd like to pull the Price of the transaction as of the Transaction Date.
Final Table:
ID Trans_Date Prod_Group Qty Price
1 08Jan2014 1 5 $25
2 08Jan2014 2 9 $30
3 09Mar2014 3 2 $25
Thank you in advance for the help!
PROD_GROUP can be character ... no problem.
HLO="H" indicates that the END value should be ignored and should be treated as if the format defined the range:
"some_value" - high = "some_price"
That would be appropriate for your data, since missing end date means that the price is in effect no matter how high the date gets.
The range overlap likely stems from needing to make a decision about the overlapping dates. For example, for PROD_GROUP 2, is 10Jan2014 exactly included in the $25 or the $30 category? There is a single variable in CNTLIN= data sets that would need to be changed to handle this, although I don't recall the variable name off the top of my head. But there is one variable that indicates whether the START point exactly should be included or included in the range definition, and another variable that indicates whether the END point should be included or excluded. It's probably better to set both of those rather than rely on default values.
If that doesn't solve the overlapping range problems, then you have worse problems with your data ... conflicting date range definitions that have to be solved before starting to program (or even extra PROD_GROUP entries). But the first step is to decide on whether the START and END dates are included or excluded in the range definitions.
Second thoughts ... the overlapping ranges need more work. There can't be multiple ranges that use HLO='H'. The format would have to be adjusted to eliminate the use of HLO and replace the logic with a range that defines up until the current date. I can't see the variable names while I'm posting, but the idea would be along these lines:
if change_date = . then end = catx(' ', prod_group, "&sysdate9"d);
Hello,
You can do the the group by on prod_group and with condition where change_date is missing, you will have all the recent records and then do the join.
Thanks,
Mike
Assuming that your dates are actual SAS dates, I would create a format based on the second table:
data fmt;
set reference_table;
fmtname='$pr_date';
start = catx(' ', prod_group, effective_date);
if change_date > . then end = catx(' ', prod_group, change_date);
else hlo='H';
label=price;
run;
proc format cntlin=fmt;
run;
It translates the combination of PROD_GROUP plus a range of dates into the price. Then utilize the format on your transactions:
data want;
set transaction_data;
price = put(catx(' ', prod_group, trans_date), $pr_date.);
run;
You might want to tweak the results depending on whether PRICE should be character or numeric.
Good luck.
Thank you for the reply.
I've tried the code, but am getting a 'Range overlap' error. I should probably have mentioned the Prod_Group is not a numeric if that matters.
What does the hlo='H'; do?
PROD_GROUP can be character ... no problem.
HLO="H" indicates that the END value should be ignored and should be treated as if the format defined the range:
"some_value" - high = "some_price"
That would be appropriate for your data, since missing end date means that the price is in effect no matter how high the date gets.
The range overlap likely stems from needing to make a decision about the overlapping dates. For example, for PROD_GROUP 2, is 10Jan2014 exactly included in the $25 or the $30 category? There is a single variable in CNTLIN= data sets that would need to be changed to handle this, although I don't recall the variable name off the top of my head. But there is one variable that indicates whether the START point exactly should be included or included in the range definition, and another variable that indicates whether the END point should be included or excluded. It's probably better to set both of those rather than rely on default values.
If that doesn't solve the overlapping range problems, then you have worse problems with your data ... conflicting date range definitions that have to be solved before starting to program (or even extra PROD_GROUP entries). But the first step is to decide on whether the START and END dates are included or excluded in the range definitions.
Second thoughts ... the overlapping ranges need more work. There can't be multiple ranges that use HLO='H'. The format would have to be adjusted to eliminate the use of HLO and replace the logic with a range that defines up until the current date. I can't see the variable names while I'm posting, but the idea would be along these lines:
if change_date = . then end = catx(' ', prod_group, "&sysdate9"d);
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.