I have the two existing datasets (Original dataset, StepRates dataset) below and I am looking to arrive at the Desired dataset. I have kept the datasets very small for illustrative purposes; in reality each dataset has many more Cusips and each Cusip contains many more records. Also, please note that I have also kept the variable names the same among all datasets, but I am in fact, dealing with three different datasets.
For each Cusip, the StepRates dataset contains the dates which represent that point in time upto (and including the date) which the corresponding AnnualCouponRates variable value is valid. Essentially, for each Cusip, the StepRates dataset (via the its CouponDates variable) dictates the AnnualCouponRates that the Original dataset ought to have. The final dataset should look like the Desired dataset.
Could anybode please help me achieve my objective?
Original Dataset StepRates Dataset Desired Dataset
Cusip CouponDates AnnualCouponRates Cusip CouponDates AnnualCouponRates Cusip CouponDates AnnualCouponRates
3136G0K91 20131011 0.5 3136G0K91 20131011 0.5 3136G0K91 20131011 0.5
3136G0K91 20140411 0.5 3136G0K91 20140411 0.625 3136G0K91 20140411 0.625
3136G0K91 20141011 0.5 3136G0K91 20141011 0.75 3136G0K91 20141011 0.75
3136G0K91 20150411 0.5 3136G0K91 20150411 0.875 3136G0K91 20150411 0.875
3136G0R52 20131024 0.5 3136G0R52 20131024 0.5 3136G0R52 20131024 0.5
3136G0R52 20140424 0.5 3136G0R52 20140424 0.625 3136G0R52 20140424 0.625
3136G0R52 20141024 0.5 3136G0R52 20141024 0.75 3136G0R52 20141024 0.75
3136G0XX4 20130828 1.0 3136G0XX4 20140828 1.0 3136G0XX4 20130828 1.0
3136G0XX4 20140228 1.0 3136G0XX4 20150828 1.25 3136G0XX4 20140228 1.0
3136G0XX4 20140828 1.0 3136G0XX4 20160828 1.5 3136G0XX4 20140828 1.0
3136G0XX4 20150228 1.0 3136G0XX4 20170828 1.75 3136G0XX4 20150228 1.25
3136G0XX4 20150828 1.0 3136G0XX4 20180828 2.0 3136G0XX4 20150828 1.25
You could use SQL as per:
PROC SORT DATA = STEPRATES OUT = STEPRATES;
BY CUSIP COUPONDATES;
RUN;
DATA STEPRATES1;
FORMAT LOWDATE DATE9.;
SET STEPRATES;
BY CUSIP COUPONDATES;
LOWDATE = LAG(COUPONDATES)+1;
IF FIRST.CUSIP THEN LOWDATE = "01JAN1800"D;
RENAME COUPONDATES = HIGHDATE;
RUN;
PROC SQL;
CREATE TABLE DESIRED AS
SELECT A.*,
B.ANNUALCOUPONRATES AS STEPANNUALCOUPONRATES
FROM ORIGINAL AS A
LEFT JOIN STEPRATES1 AS B
ON A.CUSIP = B.CUSIP
AND COUPONDATES BETWEEN LOWDATE AND HIGHDATE
;
QUIT;
recently helped me with a similar albeit more complex scenario using hash objects which might also do what you are after. It is a little long code wise, but is quite effecient for my purposes. It should be kept in mind howwever that a hash solution loads the lookup table into memory, so you need to the size of your table into consideration.
PROC SORT DATA = ORIGINAL OUT = ORIGINAL1 (RENAME = (CUSIP = ORIG_CUSIP COUPONDATES = ORIG_COUPONDATES ANNUALCOUPONRATES = ORIG_ANNUALCOUPONRATES));
BY CUSIP COUPONDATES;
RUN;
DATA DESIRED1;
IF 0 THEN SET STEPRATES;
IF _N_ = 1 THEN DO;
DECLARE HASH H(DATASET:"STEPRATES",ORDERED:"A", HASHEXP:16);
DECLARE HITER HI("H");
H.DEFINEKEY("CUSIP","COUPONDATES");
H.DEFINEDATA("CUSIP","COUPONDATES","ANNUALCOUPONRATES");
H.DEFINEDONE();
CALL MISSING(CUSIP,COUPONDATES,ANNUALCOUPONRATES);
END;
SET ORIGINAL1;
DESIRED_ANNUALCOUPONRATES = .;
IF _N_= 1 THEN RC = HI.FIRST();
DO WHILE (DESIRED_ANNUALCOUPONRATES = .);
IF ORIG_CUSIP = CUSIP THEN DO;
DO WHILE (ORIG_COUPONDATES > COUPONDATES) ;
RC = HI.NEXT();
PUT "GO FORWARD DATE" _N_= ORIG_CUSIP= CUSIP= ORIG_COUPONDATES= COUPONDATES= RC=;
END ;
DESIRED_ANNUALCOUPONRATES = ANNUALCOUPONRATES;
END;
ELSE DO;
RC = HI.NEXT();
PUT "GO FORWARD CUSIP" _N_= ORIG_CUSIP= CUSIP= ORIG_COUPONDATES= COUPONDATES= RC=;
END;
END;
OUTPUT;
DO WHILE (ORIG_CUSIP <= CUSIP and RC = 0);
RC = HI.PREV();
PUT "GO BACK" _N_= ORIG_CUSIP= CUSIP= ORIG_COUPONDATES= COUPONDATES= RC=;
END;
RUN;
I think you want to repost the data as the way the forum is displaying it isn't possible to tell which elements belong to which dataset.
And are those dates actual SAS date values displayed with a yymmdd format or something else?
Thanks for reaching out; below is a repost of the data. Does this suffice? Also, the format is yymmddn8. for all dates. Please let me know if there is anything else that you need from me.
Original Dataset
Cusip CouponDates AnnualCouponRates
3136G0K91 20131011 0.5
3136G0K91 20140411 0.5
3136G0K91 20141011 0.5
3136G0K91 20150411 0.5
3136G0R52 20131024 0.5
3136G0R52 20140424 0.5
3136G0R52 20141024 0.5
3136G0XX4 20130828 1.0
3136G0XX4 20140228 1.0
3136G0XX4 20140828 1.0
3136G0XX4 20150228 1.0
3136G0XX4 20150828 1.0
StepRates Dataset
Cusip CouponDates AnnualCouponRates
3136G0K91 20131011 0.5
3136G0K91 20140411 0.625
3136G0K91 20141011 0.75
3136G0K91 20150411 0.875
3136G0R52 20131024 0.5
3136G0R52 20140424 0.625
3136G0R52 20141024 0.75
3136G0XX4 20140828 1.0
3136G0XX4 20150828 1.25
3136G0XX4 20160828 1.5
3136G0XX4 20170828 1.75
3136G0XX4 20180828 2.0
Desired Dataset
Cusip CouponDates AnnualCouponRates
3136G0K91 20131011 0.5
3136G0K91 20140411 0.625
3136G0K91 20141011 0.75
3136G0K91 20150411 0.875
3136G0R52 20131024 0.5
3136G0R52 20140424 0.625
3136G0R52 20141024 0.75
3136G0XX4 20130828 1.0
3136G0XX4 20140228 1.0
3136G0XX4 20140828 1.0
3136G0XX4 20150228 1.25
3136G0XX4 20150828 1.25
I think that the following will do what you want:
data desired;
set original;
run;
data desired;
modify desired steprates;
by cusip coupondates;
if _iorc_=0 then replace;
run;
Arthur,
Thanks for reaching out. Did you mean the following? The Desired dataset doesn't exist; it's the one that I am supposed to generate. The Original and Steprates datasets, however, both already exist.
data desired;
modify original steprates;
by cusip coupondates;
if _iorc_=0 then replace;
run;
: No, I meant what I suggested, creating a copy of original (labeled desired) and using the modify statement using desired as the master.
I never like to use the actual master when using modify as there is a chance that the file can become corrupted if the process gets interrupted.
The result of my suggested code matches all of your desired results with the exception of Feb 28, 2015 for cusip 3136G0XX. I don't know if that was an error on your part or if there is something you want to do that I didn't see specified.
You could use SQL as per:
PROC SORT DATA = STEPRATES OUT = STEPRATES;
BY CUSIP COUPONDATES;
RUN;
DATA STEPRATES1;
FORMAT LOWDATE DATE9.;
SET STEPRATES;
BY CUSIP COUPONDATES;
LOWDATE = LAG(COUPONDATES)+1;
IF FIRST.CUSIP THEN LOWDATE = "01JAN1800"D;
RENAME COUPONDATES = HIGHDATE;
RUN;
PROC SQL;
CREATE TABLE DESIRED AS
SELECT A.*,
B.ANNUALCOUPONRATES AS STEPANNUALCOUPONRATES
FROM ORIGINAL AS A
LEFT JOIN STEPRATES1 AS B
ON A.CUSIP = B.CUSIP
AND COUPONDATES BETWEEN LOWDATE AND HIGHDATE
;
QUIT;
recently helped me with a similar albeit more complex scenario using hash objects which might also do what you are after. It is a little long code wise, but is quite effecient for my purposes. It should be kept in mind howwever that a hash solution loads the lookup table into memory, so you need to the size of your table into consideration.
PROC SORT DATA = ORIGINAL OUT = ORIGINAL1 (RENAME = (CUSIP = ORIG_CUSIP COUPONDATES = ORIG_COUPONDATES ANNUALCOUPONRATES = ORIG_ANNUALCOUPONRATES));
BY CUSIP COUPONDATES;
RUN;
DATA DESIRED1;
IF 0 THEN SET STEPRATES;
IF _N_ = 1 THEN DO;
DECLARE HASH H(DATASET:"STEPRATES",ORDERED:"A", HASHEXP:16);
DECLARE HITER HI("H");
H.DEFINEKEY("CUSIP","COUPONDATES");
H.DEFINEDATA("CUSIP","COUPONDATES","ANNUALCOUPONRATES");
H.DEFINEDONE();
CALL MISSING(CUSIP,COUPONDATES,ANNUALCOUPONRATES);
END;
SET ORIGINAL1;
DESIRED_ANNUALCOUPONRATES = .;
IF _N_= 1 THEN RC = HI.FIRST();
DO WHILE (DESIRED_ANNUALCOUPONRATES = .);
IF ORIG_CUSIP = CUSIP THEN DO;
DO WHILE (ORIG_COUPONDATES > COUPONDATES) ;
RC = HI.NEXT();
PUT "GO FORWARD DATE" _N_= ORIG_CUSIP= CUSIP= ORIG_COUPONDATES= COUPONDATES= RC=;
END ;
DESIRED_ANNUALCOUPONRATES = ANNUALCOUPONRATES;
END;
ELSE DO;
RC = HI.NEXT();
PUT "GO FORWARD CUSIP" _N_= ORIG_CUSIP= CUSIP= ORIG_COUPONDATES= COUPONDATES= RC=;
END;
END;
OUTPUT;
DO WHILE (ORIG_CUSIP <= CUSIP and RC = 0);
RC = HI.PREV();
PUT "GO BACK" _N_= ORIG_CUSIP= CUSIP= ORIG_COUPONDATES= COUPONDATES= RC=;
END;
RUN;
Not sure. Can't is be done ?
data desired;
merge desOriginal(in=ina) StepRates ;
by cusip coupondates;
if ina ;
run;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.