BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
maroulator
Obsidian | Level 7

I have the two existing datasets (Original dataset, StepRates dataset) below and I am looking to arrive at the Desired dataset. I have kept the datasets very small for illustrative purposes; in reality each dataset has many more Cusips and each Cusip contains many more records. Also, please note that I have also kept the variable names the same among all datasets, but I am in fact, dealing with three different datasets.

For each Cusip, the StepRates dataset contains the dates which represent that point in time upto (and including the date) which the corresponding AnnualCouponRates variable value is valid. Essentially, for each Cusip, the StepRates dataset (via the its CouponDates variable) dictates the AnnualCouponRates that the Original dataset ought to have. The final dataset should look like the Desired dataset.

Could anybode please help me achieve my objective?

Original Dataset                                                          StepRates Dataset                                                       Desired Dataset

Cusip    CouponDates  AnnualCouponRates     Cusip        CouponDates  AnnualCouponRates              Cusip        CouponDates  AnnualCouponRates
3136G0K91  20131011        0.5                       3136G0K91   20131011            0.5                                3136G0K91   20131011                 0.5
3136G0K91  20140411        0.5                       3136G0K91   20140411            0.625                            3136G0K91   20140411                 0.625
3136G0K91  20141011        0.5                       3136G0K91   20141011            0.75                              3136G0K91   20141011                 0.75
3136G0K91  20150411        0.5                       3136G0K91   20150411            0.875                            3136G0K91   20150411                 0.875
3136G0R52  20131024        0.5                       3136G0R52   20131024            0.5                                3136G0R52   20131024                 0.5
3136G0R52  20140424        0.5                       3136G0R52   20140424            0.625                            3136G0R52   20140424                 0.625
3136G0R52  20141024        0.5                       3136G0R52   20141024            0.75                              3136G0R52   20141024                 0.75
3136G0XX4  20130828        1.0                       3136G0XX4   20140828            1.0                                3136G0XX4   20130828                1.0  

3136G0XX4  20140228        1.0                       3136G0XX4   20150828            1.25                              3136G0XX4   20140228                1.0        
3136G0XX4  20140828        1.0                       3136G0XX4   20160828            1.5                                3136G0XX4   20140828                1.0 

3136G0XX4  20150228        1.0                       3136G0XX4   20170828            1.75                              3136G0XX4   20150228                1.25
3136G0XX4  20150828        1.0                       3136G0XX4   20180828            2.0                                3136G0XX4   20150828                1.25

1 ACCEPTED SOLUTION

Accepted Solutions
Scott_Mitchell
Quartz | Level 8

You could use SQL as per:


PROC SORT DATA = STEPRATES OUT = STEPRATES;
   BY CUSIP COUPONDATES;
RUN;

DATA STEPRATES1;
FORMAT LOWDATE DATE9.;
SET STEPRATES;
BY CUSIP COUPONDATES;
LOWDATE = LAG(COUPONDATES)+1;
IF FIRST.CUSIP THEN LOWDATE = "01JAN1800"D;
RENAME  COUPONDATES = HIGHDATE;
RUN;

PROC SQL;
CREATE TABLE DESIRED AS
SELECT A.*,
    B.ANNUALCOUPONRATES AS STEPANNUALCOUPONRATES
FROM ORIGINAL AS A
LEFT JOIN STEPRATES1 AS B
ON A.CUSIP = B.CUSIP
AND COUPONDATES BETWEEN LOWDATE AND HIGHDATE
;
QUIT;

recently helped me with a similar albeit more complex scenario using hash objects which might also do what you are after.  It is a little long code wise, but is quite effecient for my purposes.  It should be kept in mind howwever that a hash solution loads the lookup table into memory, so you need to the size of your table into consideration.


PROC SORT DATA = ORIGINAL OUT = ORIGINAL1 (RENAME = (CUSIP = ORIG_CUSIP COUPONDATES = ORIG_COUPONDATES ANNUALCOUPONRATES = ORIG_ANNUALCOUPONRATES));
   BY CUSIP COUPONDATES;
RUN;

DATA DESIRED1;

IF 0 THEN SET STEPRATES;

IF _N_ = 1 THEN DO;
  DECLARE HASH H(DATASET:"STEPRATES",ORDERED:"A", HASHEXP:16);
  DECLARE HITER HI("H");
  H.DEFINEKEY("CUSIP","COUPONDATES");
  H.DEFINEDATA("CUSIP","COUPONDATES","ANNUALCOUPONRATES");
  H.DEFINEDONE();
  CALL MISSING(CUSIP,COUPONDATES,ANNUALCOUPONRATES);
END;

SET ORIGINAL1;

DESIRED_ANNUALCOUPONRATES = .;

IF _N_= 1 THEN RC = HI.FIRST();

DO WHILE (DESIRED_ANNUALCOUPONRATES = .);
  IF ORIG_CUSIP = CUSIP THEN DO;
   DO WHILE (ORIG_COUPONDATES > COUPONDATES) ;
    RC = HI.NEXT();
    PUT "GO FORWARD DATE" _N_= ORIG_CUSIP= CUSIP= ORIG_COUPONDATES= COUPONDATES= RC=;
        END ;     
       DESIRED_ANNUALCOUPONRATES = ANNUALCOUPONRATES; 
  END;

  ELSE DO;
   RC = HI.NEXT();
   PUT "GO FORWARD CUSIP" _N_= ORIG_CUSIP= CUSIP= ORIG_COUPONDATES= COUPONDATES= RC=;
  END;  
END;

  OUTPUT;
DO WHILE (ORIG_CUSIP <= CUSIP and RC = 0);
    RC = HI.PREV();
PUT "GO BACK" _N_= ORIG_CUSIP= CUSIP= ORIG_COUPONDATES=  COUPONDATES= RC=;
END;
RUN;

View solution in original post

7 REPLIES 7
ballardw
Super User

I think you want to repost the data as the way the forum is displaying it isn't possible to tell which elements belong to which dataset.

And are those dates actual SAS date values displayed with a yymmdd format or something else?

maroulator
Obsidian | Level 7

Thanks for reaching out; below is a repost of the data. Does this suffice? Also, the format is yymmddn8. for all dates. Please let me know if there is anything else that you need from me.


Original Dataset  

                                   

Cusip              CouponDates  AnnualCouponRates 
3136G0K91          20131011        0.5         
3136G0K91          20140411        0.5         
3136G0K91          20141011        0.5        
3136G0K91          20150411        0.5         
3136G0R52          20131024        0.5         
3136G0R52          20140424        0.5         
3136G0R52          20141024        0.5         
3136G0XX4          20130828        1.0                        
3136G0XX4          20140228        1.0         
3136G0XX4          20140828        1.0    

3136G0XX4          20150228        1.0         

3136G0XX4          20150828        1.0

StepRates Dataset   

                                  

Cusip              CouponDates     AnnualCouponRates  
3136G0K91        20131011                0.5          
3136G0K91        20140411                0.625        
3136G0K91        20141011                0.75         
3136G0K91       20150411                 0.875        
3136G0R52       20131024                 0.5          
3136G0R52       20140424                 0.625        
3136G0R52       20141024                 0.75         
3136G0XX4       20140828                1.0                                                                                                                                                                     

3136G0XX4       20150828                1.25                                                                               

3136G0XX4       20160828                1.5   
3136G0XX4       20170828                1.75          
3136G0XX4       20180828                2.0

Desired Dataset   

                                  

Cusip              CouponDates     AnnualCouponRates  
3136G0K91        20131011                0.5          
3136G0K91        20140411                0.625        
3136G0K91        20141011                0.75         
3136G0K91       20150411                 0.875        
3136G0R52       20131024                 0.5          
3136G0R52       20140424                 0.625        
3136G0R52       20141024                 0.75         
3136G0XX4       20130828                1.0                                                                                                                                                                     

3136G0XX4       20140228                1.0                                                                               

3136G0XX4       20140828                1.0  
3136G0XX4       20150228                1.25          
3136G0XX4       20150828                1.25

art297
Opal | Level 21

I think that the following will do what you want:

data desired;

  set original;

run;

data desired;

   modify desired steprates;

   by cusip coupondates;

   if _iorc_=0 then replace;

run;

maroulator
Obsidian | Level 7

Arthur,

Thanks for reaching out. Did you mean the following? The Desired dataset doesn't exist; it's the one that I am supposed to generate. The Original and Steprates datasets, however, both already exist.

data desired;

   modify original steprates;

   by cusip coupondates;

   if _iorc_=0 then replace;

run;

art297
Opal | Level 21

: No, I meant what I suggested, creating a copy of original (labeled desired) and using the modify statement using desired as the master.

I never like to use the actual master when using modify as there is a chance that the file can become corrupted if the process gets interrupted.

The result of my suggested code matches all of your desired results with the exception of Feb 28, 2015 for cusip 3136G0XX. I don't know if that was an error on your part or if there is something you want to do that I didn't see specified.

Scott_Mitchell
Quartz | Level 8

You could use SQL as per:


PROC SORT DATA = STEPRATES OUT = STEPRATES;
   BY CUSIP COUPONDATES;
RUN;

DATA STEPRATES1;
FORMAT LOWDATE DATE9.;
SET STEPRATES;
BY CUSIP COUPONDATES;
LOWDATE = LAG(COUPONDATES)+1;
IF FIRST.CUSIP THEN LOWDATE = "01JAN1800"D;
RENAME  COUPONDATES = HIGHDATE;
RUN;

PROC SQL;
CREATE TABLE DESIRED AS
SELECT A.*,
    B.ANNUALCOUPONRATES AS STEPANNUALCOUPONRATES
FROM ORIGINAL AS A
LEFT JOIN STEPRATES1 AS B
ON A.CUSIP = B.CUSIP
AND COUPONDATES BETWEEN LOWDATE AND HIGHDATE
;
QUIT;

recently helped me with a similar albeit more complex scenario using hash objects which might also do what you are after.  It is a little long code wise, but is quite effecient for my purposes.  It should be kept in mind howwever that a hash solution loads the lookup table into memory, so you need to the size of your table into consideration.


PROC SORT DATA = ORIGINAL OUT = ORIGINAL1 (RENAME = (CUSIP = ORIG_CUSIP COUPONDATES = ORIG_COUPONDATES ANNUALCOUPONRATES = ORIG_ANNUALCOUPONRATES));
   BY CUSIP COUPONDATES;
RUN;

DATA DESIRED1;

IF 0 THEN SET STEPRATES;

IF _N_ = 1 THEN DO;
  DECLARE HASH H(DATASET:"STEPRATES",ORDERED:"A", HASHEXP:16);
  DECLARE HITER HI("H");
  H.DEFINEKEY("CUSIP","COUPONDATES");
  H.DEFINEDATA("CUSIP","COUPONDATES","ANNUALCOUPONRATES");
  H.DEFINEDONE();
  CALL MISSING(CUSIP,COUPONDATES,ANNUALCOUPONRATES);
END;

SET ORIGINAL1;

DESIRED_ANNUALCOUPONRATES = .;

IF _N_= 1 THEN RC = HI.FIRST();

DO WHILE (DESIRED_ANNUALCOUPONRATES = .);
  IF ORIG_CUSIP = CUSIP THEN DO;
   DO WHILE (ORIG_COUPONDATES > COUPONDATES) ;
    RC = HI.NEXT();
    PUT "GO FORWARD DATE" _N_= ORIG_CUSIP= CUSIP= ORIG_COUPONDATES= COUPONDATES= RC=;
        END ;     
       DESIRED_ANNUALCOUPONRATES = ANNUALCOUPONRATES; 
  END;

  ELSE DO;
   RC = HI.NEXT();
   PUT "GO FORWARD CUSIP" _N_= ORIG_CUSIP= CUSIP= ORIG_COUPONDATES= COUPONDATES= RC=;
  END;  
END;

  OUTPUT;
DO WHILE (ORIG_CUSIP <= CUSIP and RC = 0);
    RC = HI.PREV();
PUT "GO BACK" _N_= ORIG_CUSIP= CUSIP= ORIG_COUPONDATES=  COUPONDATES= RC=;
END;
RUN;

Ksharp
Super User

Not sure. Can't is be done ?

data desired;

   merge desOriginal(in=ina)  StepRates ;

   by cusip coupondates;

   if ina ;

run;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 692 views
  • 6 likes
  • 5 in conversation