I am trying to calculate different duration periods that measure how long a firm remains in a given manager's portfolio.
The data structure looks as follows:
What have I done?
What am I missing? (I want to overcome the bias of not taking into account for the gaps between reporting dates)
Here are my "have" as well as "want" datasets:
DATA have1;
INPUT Mgrno Firm_ID $ Report_Date Date9. Rate;
FORMAT Report_Date Date9.;
DATALINES;
19500 12727AZ 31MAR1980 0.04
19500 12727AZ 30JUN1980 0.05
19500 12727AZ 30SEP1980 0.08
19500 12727AZ 31DEC1980 0.07
19500 12727AZ 31MAR1981 0.09
19500 12727AZ 30JUN1981 0.02
19500 12727AZ 30SEP1981 0.08
19500 12727AZ 31DEC1981 0.05
19500 12727AZ 31MAR1982 0.07
19500 12727AZ 30JUN1982 0.10
39547 12727AZ 30JUN1980 0.02
39547 12727AZ 30SEP1980 0.01
39547 12727AZ 31DEC1980 0.06
39547 12727AZ 31MAR1981 0.06
99999 731238A 31MAR1982 0.11
99999 731238A 30JUN1982 0.12
99999 731238A 30JUN1983 0.07
;
run;
Data have2;
INPUT Mgrno Firm_ID $ Event_Date Date9. Rate Group $;
FORMAT Event_Date Date9.;
DATALINES;
19500 12727AZ 31MAR1982 0.07 AA
39547 12727AZ 31MAR1981 0.06 BB
99999 731238A 30JUN1983 0.07 BB
;
run;
/*
Holding_Duration1=Number of quarters past until the event_date since the company was included in the portfolio of the manager for the first time until the event_date
Holding_Duration1=Number of quarters past until the event_date since the company was included in the portfolio of the manager for the first time until the event_date
as well as where the rate >= 0.05
Total=Number of quarter past from the first time the company was added into the porfolio until the event_date
Event_minus_First= Event date - First date in terms of number of quarters
*/
Data want;
INPUT Mgrno Firm_ID $ Event_Date Date9. Rate Group $ Holding_Duration1 Holding_Duration2 Total Event_minus_First Last_Subsequent_Duration;
FORMAT Event_Date Date9.;
DATALINES;
19500 12727AZ 31MAR1981 0.07 AA 8 6 8 8 6
39547 12727AZ 31MAR1981 0.06 BB 4 2 4 4 6
99999 731238A 30JUN1983 0.07 BB 3 3 3 7 1
;
run;
Here is my code for the "what I have done" part:
*First I will include all the reported holding filings of the manager for the given firm;
proc sql;
create table first_method_V1 as
select *
from have1 as a
right join have2 as b
on (a.mgrno=b.mgrno) & (a.firm_id=b.firm_id);
quit;
proc summary data=work.first_method_V1 nway;
var Report_Date;
class mgrno group;
output out=work.Holding_Duration1
n=Holding_Duration1;
run;
data only_5percent;
set first_method_V1;
if rate >= 0.05 & Event_Date >= Report_Date;
run;
proc summary data=work.only_5percent nway;
var Report_Date;
class mgrno group;
output out=work.Holding_Duration2
n=Holding_Duration2;
run;
data first_method_V2(drop= _TYPE_ _FREQ_);
merge first_method_V1 Holding_Duration1 Holding_Duration2;
by mgrno group;
run;
Thanks in advance for any suggestions / help you may provide.
I see, the concept of a Do "Whitlock" loop might be something to introduce here...
proc sort data=have1 out=have1_sorted ;
by mgrno Firm_ID report_date;
quit;
data try (Keep=Mgrno Firm_ID beg_hold_period end_hold_period) ;
do until (last.Firm_ID);
set have1_sorted;
by Mgrno Firm_ID Report_date;
lag_Report_date=lag(Report_date);
format beg_hold_period end_hold_period date9.;
if first.FIRM_ID
then beg_hold_period=Report_date;
else if intck("Month3", lag_Report_date,Report_date)>1
then do;
end_hold_period=lag_Report_date;
output;
beg_hold_period=Report_date;
end;
end;
end_hold_period=Report_date;
output;
run;
not the complete solution but demostrates that a dataset can be created with one record per holding period, consisting of holdings held consecutive quarters.
use INTCK() function. For your case the interval will be "MONTH3" The interval being "Month" and the multiplier being "3".
@PhilC, thanks for the suggestion, but in the document it states the following:
"The intervals must be listed in ascending order. There cannot be gaps between intervals, and intervals cannot overlap."
The major issue I am having is that there may be some gaps between the interval and what I am trying to calculate is the most recent reporting date (before the event date) in which there are no gaps. Then, I can simply just calculate the quarters between these two dates.
Do you have any suggestions how to overcome this issue?
@PhilC with the intck() function, I can calculate the intervals but still have the gaps in between issue. To overcome that issue, I thought of making use of the lag() function and checking whether there are any gaps. What I am still missing is the following:
Any suggestions will be greatly appreciated. Here is what I have added to the above code:
proc sort data=first_method_V2;
by mgrno report_date;
quit;
data try;
set first_method_V2;
Q=intck('month3', Report_date, Event_date);
run;
data try;
set try;
gaps=Q-lag(Q);
if gaps^=-1 then flag=1; else flag=0;
run;
I see, the concept of a Do "Whitlock" loop might be something to introduce here...
proc sort data=have1 out=have1_sorted ;
by mgrno Firm_ID report_date;
quit;
data try (Keep=Mgrno Firm_ID beg_hold_period end_hold_period) ;
do until (last.Firm_ID);
set have1_sorted;
by Mgrno Firm_ID Report_date;
lag_Report_date=lag(Report_date);
format beg_hold_period end_hold_period date9.;
if first.FIRM_ID
then beg_hold_period=Report_date;
else if intck("Month3", lag_Report_date,Report_date)>1
then do;
end_hold_period=lag_Report_date;
output;
beg_hold_period=Report_date;
end;
end;
end_hold_period=Report_date;
output;
run;
not the complete solution but demostrates that a dataset can be created with one record per holding period, consisting of holdings held consecutive quarters.
Thanks @PhilC. I will give it a try and keep you updated if I come up with a few other changes.
@PhilC, I just added a few lines to your code and I think the code is now working well:
proc sort data=have1 out=have1_sorted ;
by mgrno Firm_ID report_date;
quit;
data try (Keep=Mgrno Firm_ID beg_hold_period end_hold_period) ;
do until (last.Firm_ID);
set have1_sorted;
by Mgrno Firm_ID Report_date;
lag_Report_date=lag(Report_date);
format beg_hold_period end_hold_period date9.;
if first.FIRM_ID
then beg_hold_period=Report_date;
else if intck("Month3", lag_Report_date,Report_date)>1 then do;
end_hold_period=lag_Report_date;
output;
beg_hold_period=Report_date;
end;
end;
end_hold_period=Report_date;
output;
run;
*Added the part below ;
proc sort data=try;
by mgrno firm_id descending end_hold_period;
quit;
proc sort data=try nodupkey;
by mgrno firm_id;
quit;
data merged;
merge try have1 have2;
by mgrno firm_id;
run;
data counting;
set merged;
if Event_Date >= Report_Date >= beg_hold_period;
run;
proc summary data=work.counting nway;
var Report_Date;
class mgrno group;
output out=work.duration
n=duration;
run;
glad it worked
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.