Hi,
I am having a dataset which has a column 'RATE' with some missing values and my objective is to populate that column.
To fill the missing values, I need to extrapolate some values and interpolate some.
1. For a particular 'CURRENCY' the first missing values and last missing values will be exactly the same as the nearest starting or ending value for the same CURRENCY.
2. For middle missing values, I need to calculate 2 different weights.
Weight 1 = (Nearest higher PERIOD_YRS - PERIOD_YRS) / ((Nearest higher PERIOD - (Nearest lower PERIOD_YRS).
Weight 2 = 1 - Weight1
Expected value = (Weight 1 * NEAREST LOWER PERIOD_YRS) + (Weight 2 * NEAREST HIGHER PERIOD_YRS)
Here is the sample data
CURRENCY | SEQ | PERIOD | COUNTER_IN_GROUP | PERIOD_YRS | RATE | EXP RESULT | Weight1 | Weight2 |
AUD | 1 | 3_MONTH | 1 | 0.25 | 3.689629437 | |||
AUD | 1 | 2_YEAR | 2 | 2 | 3.689629437 | |||
AUD | 1 | 20_YEAR | 3 | 20 | 3.689629437 | 3.689629437 | ||
CAD | 2 | 6_MONTH | 1 | 0.5 | 0.65523605 | |||
CAD | 2 | 1_YEAR | 2 | 1 | 0.65523605 | 0.65523605 | ||
CAD | 2 | 3_YEAR | 3 | 3 | 1.028227829 | 0.333333333 | 0.666666667 | |
CAD | 2 | 4_YEAR | 4 | 4 | 1.214723719 | 1.214723719 | ||
CAD | 2 | 5_YEAR | 5 | 5 | 1.389134405 | 0.666666667 | 0.333333333 | |
CAD | 2 | 7_YEAR | 6 | 7 | 1.737955777 | 1.737955777 | ||
CAD | 2 | 10_YEAR | 7 | 10 | 1.737955777 |
Weight 1 = (4YR - 3YR)/(4YR - 1YR)
Weight 2 = (1 - Weight 1)
My SAS Code
proc transpose data= WORK.Sample out=trnspose prefix=_yc;
by SEQ;
id COUNTER_IN_GROUP;
var RATE WEIGHT;
RUN;
DATA WORK.CONSOL_3 (drop=_:);
set WORK.CONSOL_2 ;
if COUNTER_IN_GROUP=1 then do; /* Read and correct the transposed set of rate values */
set trnspose (keep=_yc:);
array _yc {*} _yc: ;
/* fill in leading missings */
_x=coalesce(of _yc{*}); /* Get first non-missing */
do _s=1 by 1 while (_yc{_s}=.);
_yc{_s}=_x;
end;
do _s=dim(_yc) by -1 while (_yc{_s}=.); /* Get index of last non-missing*/
end;
if _s<dim(_yc) then do _s=_s+1 to dim(_yc); /* Fill in trailing missings */
_yc{_s}=_yc{_s-1};
end;
/* Fill in interior holes - straight line interpolation */
do while (nmiss(of _yc{*})>0);
do _before_s=1 by 1 until(_yc{_before_s+1}=.); /* find SEQ before the hole*/
end;
do _after_s=_before_s+1 by 1 while (_yc{_after_s}=.); /* find SEQ after the hole */
end;
_slope= (_yc{_after_s}-_yc{_before_s}) / (_after_s-_before_s) ;
do _s=_before_s+1 to _after_s-1; /* Traverse the hole */
_yc{_s} = _yc{_before_s} + _slope * (_s - _before_s);
end;
end;
end;
if RATE=. then RATE =_yc{seq};
run;
You wrote
> Expected value = (Weight 1 * NEAREST LOWER PERIOD_YRS) + (Weight 2 * NEAREST HIGHER PERIOD_YRS)
The correct formula is
Expected value = (Weight 1 * NEAREST LOWER RATE) + (Weight 2 * NEAREST HIGHER RATE)
> the question was valid.
I told you how to ask questions using the appropriate formatting. Feel free to ignore. And to be ignored. You seem to know better.
> Try being decent with responses.
Don't worry, I'll make sure not to try helping you.
This seems to work. Please alter as needed.
proc sql;
create table WANT as
select base.*
, before.COUNTER_IN_GROUP as CB, before.PERIOD_YRS as PB, before.RATE as RB, MIN
, after .COUNTER_IN_GROUP as CA, after .PERIOD_YRS as PA, after .RATE as RA, MAX
, (PA - base.PERIOD_YRS) / (PA-PB) as W1
, 1- calculated W1 as W2
, case when base.RATE then base.RATE
when base.COUNTER_IN_GROUP<MIN then RB
when base.COUNTER_IN_GROUP>MAX then RA
else calculated W1 * RB + calculated W2 * RA end as EXP
from HAVE base
left join
(select *, min(COUNTER_IN_GROUP) as MIN from HAVE where RATE group by CURRENCY) before
on base.CURRENCY=before.CURRENCY
and ^base.RATE
and ( (before.COUNTER_IN_GROUP=base.COUNTER_IN_GROUP-1
| (before.COUNTER_IN_GROUP=MIN & base.COUNTER_IN_GROUP<MIN )) )
left join
(select *, max(COUNTER_IN_GROUP) as MAX from HAVE where RATE group by CURRENCY) after
on base.CURRENCY=after.CURRENCY
and ^base.RATE
and ( (after.COUNTER_IN_GROUP=base.COUNTER_IN_GROUP+1
| (after.COUNTER_IN_GROUP=MAX & base.COUNTER_IN_GROUP>MAX )) )
order by base.CURRENCY,base.COUNTER_IN_GROUP;
select CURRENCY, SEQ, PERIOD, COUNTER_IN_GROUP, PERIOD_YRS, RATE, W1, W2, EXP from WANT;
CURRENCY | SEQ | PERIOD | COUNTER_IN_GROUP | PERIOD_YRS | RATE | W1 | W2 | EXP |
---|---|---|---|---|---|---|---|---|
AUD | 1 | 3_MONTH | 1 | 0.25 | . | . | . | 3.689629 |
AUD | 1 | 2_YEAR | 2 | 2 | . | . | . | 3.689629 |
AUD | 1 | 20_YEAR | 3 | 20 | 3.689629 | . | . | 3.689629 |
CAD | 2 | 6_MONTH | 1 | 0.5 | . | . | . | 0.655236 |
CAD | 2 | 1_YEAR | 2 | 1 | 0.655236 | . | . | 0.655236 |
CAD | 2 | 3_YEAR | 3 | 3 | . | 0.333333 | 0.666667 | 1.028228 |
CAD | 2 | 4_YEAR | 4 | 4 | 1.214724 | . | . | 1.214724 |
CAD | 2 | 5_YEAR | 5 | 5 | . | 0.666667 | 0.333333 | 1.389134 |
CAD | 2 | 7_YEAR | 6 | 7 | 1.737956 | . | . | 1.737956 |
CAD | 2 | 10_YEAR | 7 | 10 | . | . | . | 1.737956 |
Your formula is wrong. 1) The least you can do is verify that you ask a valid question, rather than waste our time.
Also 2) please post your data in usable format (typically a data step) and 3) post your code in a legible format using the appropriate icon.
Your chances of being helped are otherwise much slimmer. 🙂
Thanks. Try being decent with responses. The formula was correct and the question was valid.
You wrote
> Expected value = (Weight 1 * NEAREST LOWER PERIOD_YRS) + (Weight 2 * NEAREST HIGHER PERIOD_YRS)
The correct formula is
Expected value = (Weight 1 * NEAREST LOWER RATE) + (Weight 2 * NEAREST HIGHER RATE)
> the question was valid.
I told you how to ask questions using the appropriate formatting. Feel free to ignore. And to be ignored. You seem to know better.
> Try being decent with responses.
Don't worry, I'll make sure not to try helping you.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.