Good days to all,
I have a SAS dataset (qtrforecast). I wish to calculate the frequency of meeting analysts' earnings forecast (MEET) in the past 4 quarters.
MEET = 1 if meeting forecast; else 0.
datadate = financial year-end.
fpedats = forecast date
cusip6 = companies' identification code
For example, for the company with cusip6 of "000360", its financial year-end = 31DEC2003, then earnings forecast for the past 4 quarters will be on 31MAR2003, 30JUN2003, 30SEP2003 and 31DEC2003. As only 1x of meeting forecast.
The "FREQ" = 1 and PERCENT_MEET = 1/4 = 0.25.
I have a problem to calculate this variable as datadate is not available for quarter 1 - 3. For observations without any datadate will be deleted.
I wish to get any useful help on this problem.
Thanks in advance for helps.
Regards,
pakms
My bad, I took your data for granted, not checking the details, the month of your 'datadate' actually have been seen in every possible month of a year, from Jan to Dec, all of them. Here is another attempt, let me know how it works:
proc sql noprint;
create table want as
select a.cusip6, sum(a.meet) as FREQ,
calculated freq/count(distinct a.FPEDATS) as PERCENT_MEET format=percent6.2,
b.datadate
from (select distinct cusip6, fpedats, meet,datadate from qtrforecast) a
inner join
(select distinct cusip6, datadate from qtrforecast(where=(not missing(datadate)))) b
on a.cusip6=b.cusip6 and intnx('qtr',b.datadate,-3,'b') <= a.fpedats <= b.datadate
group by a.cusip6, b.datadate
;
quit;
Haikuo
Updated using quarter instead of year.
Hi,
I am not sure I understand your requirements. Please provide some sample output data based on the input data you have provided.
Regards,
Amir.
Hi Amir,
I would like to have an output data (with the following 4 variables) as follows:
cusip6 FREQ PERCENT_MEET DATADATE
Thank you.
Regards,
pakms
Hi,
For the example you were working through for a cusip6 of "000360", please give the values you would like to see for the four columns you have specified.
Regards,
Amir.
Hi again,
For the cusip6 of "000360", the following should be shown:
cusip6 FREQ PERCENT_MEET DATADATE
000360 1 0.25 31DEC2003
Thank you.
Regards,
Pakms
Hi again,
For the cusip6 of "000360", the following should be shown:
cusip6 FREQ PERCENT_MEET DATADATE
000360 1 0.25 31DEC2003
000360 1 0.25 31DEC2004
000360 1 0.25 31DEC2005
000360 3 0.75 31DEC2006
000360 3 0.75 31DEC2007
000360 4 1.00 31DEC2008
000360 3 0.75 31DEC2009
000360 1 0.25 31DEC2010
Any duplicate, for example the qtr4 of year ended 31 Dec 2004 and 2005 should be ignored.
This example is clearer.
Thank you.
Regards,
Pakms
Hi,
The sample output data has helped and I came up with the below code, but when I checked the results they did not all look right. Then I noticed that the input data was not entirely consistent as Art has pointed out.
data want(keep=cusip6 freq percent_meet datadate);
set tmp1.Qtrforecast;
retain freq 0
old_datadate 0
;
freq+sum(0,meet);
if datadate ne . then
if datadate=old_datadate then
do;
link reset;
delete;
end;
else
do;
percent_meet=freq/4;
output;
link reset;
end;
return;
reset:
freq=0;
old_datadate=datadate;
return;
run;
Regards,
Amir.
Thanks Amir for suggestion.
But this output did not work well as it contains the percentage of meeting earnings' forecasts > 100%.
Hi,
Yes, that was one of the issues I noticed. How about the following?
data want(keep=cusip6 freq percent_meet datadate);
set tmp1.Qtrforecast;
by cusip6;
retain freq 0
old_datadate 0
;
if datadate ne . then
if datadate=old_datadate then
do;
link reset;
delete;
end;
else
do;
freq+sum(0,meet);
percent_meet=freq/4;
output;
link reset;
end;
if last.cusip6 then
link reset;
return;
reset:
freq=0;
old_datadate=datadate;
return;
run;
I essentially moved the freq calculation.
I also added some by processing to output data for records such as cusip6=02365V, which didn't have 4 quarters of data and so would otherwise not be output. I was not sure if you wanted these out or not.
Regards,
Amir.
Pakms,
There are a couple of discrepancies that you will have to decide how you want/should approach the problem.
First, there is one duplicate cusip, namely 000000, that is shared by more than one company.
Second, there are some companies in your data that changed their fiscal year within the span of time that your data covers. How do you want to deal with those companies?
Third, there are some companies in your data that don't have any datadate for any of their records.
Art
Hi Art,
1. The duplicate cusip, with more than more than one companies' name - it assumes that the company changed its company's name. - same company assumed.
2. there are some companies in your data that changed their fiscal year within the span of time that your data covers. How do you want to deal with those companies? I have to calculate the frequency of meeting forecast within the last 4 quarters based on the datadate. There should be 4 times of estimates
3. there are some companies in your data that don't have any datadate for any of their records. - Companies without datdate will be ignored or deleted.
Thank you for your advise and concerns.
Regards,
pakms
Hi,
Like others have pointed out, your data itself needs some work. Here is a SQL approach:
proc sql noprint;
create table want as
select cusip6,
intnx('year',FPEDATS,0,'end') as datadate format=date9.,
sum(meet) as FREQ,
calculated freq/count(distinct FPEDATS) as PERCENT_MEET format=percent6.2
from (select distinct cusip6, fpedats, meet from qtrforecast)
group by cusip6, calculated datadate ;
quit;
Haikuo
Thanks Haikuo for your program. But I don't think this is the answer too.
If the datadate is missing for any firm, this firm should not be included in output. For example, if the datadate is 31MAR2005. Then the past 4 quarters forecast will be on 30JUN2004, 30SEP2004, 31DEC2004 and 31MAR2005.
cusip6 with "000000" should not be included as the datadate is not available. the "distinct" function for variables of "cusip6","fpedats" and "meet" from qtrforecas do help in data cleaning as this reduces duplication of forecast for the same period.
Thank you.
Regards,
pakms
Oops, did not notice you have that requirement. There is an easy fix on my current code, you sacrifice some performance, but it gets the job done.
proc sql noprint;
create table want as
select cusip6,
intnx('year',FPEDATS,0,'end') as datadate format=date9.,
sum(meet) as FREQ,
calculated freq/count(distinct FPEDATS) as PERCENT_MEET format=percent6.2
from (select distinct cusip6, fpedats, meet from qtrforecast)
group by cusip6, calculated datadate
having cats(cusip6,datadate) in (select cats(cusip6,datadate) from qtrforecast);
quit;
Haikuo
Hi Haikuo,
I provided the datadate in the table as follows (only cusip6 of 000360 is shown):
FPEDATS | cusip6 | datadate | MEET |
31-Mar-03 | 000360 | . | 0 |
30-Jun-03 | 000360 | . | 0 |
30-Sep-03 | 000360 | . | 0 |
31-Dec-03 | 000360 | 31-Dec-03 | 1 |
31-Mar-04 | 000360 | . | 0 |
30-Jun-04 | 000360 | . | 0 |
30-Sep-04 | 000360 | . | 0 |
31-Dec-04 | 000360 | 31-Dec-04 | 1 |
31-Dec-04 | 000360 | 31-Dec-04 | 1 |
31-Mar-05 | 000360 | . | 1 |
30-Jun-05 | 000360 | . | 0 |
30-Sep-05 | 000360 | . | 0 |
31-Dec-05 | 000360 | 31-Dec-05 | 0 |
31-Dec-05 | 000360 | 31-Dec-05 | 0 |
31-Mar-06 | 000360 | . | 1 |
30-Jun-06 | 000360 | . | 0 |
30-Sep-06 | 000360 | . | 1 |
31-Dec-06 | 000360 | 31-Dec-06 | 1 |
31-Mar-07 | 000360 | . | 1 |
30-Jun-07 | 000360 | . | 1 |
30-Sep-07 | 000360 | . | 0 |
31-Dec-07 | 000360 | 31-Dec-07 | 1 |
31-Mar-08 | 000360 | . | 1 |
30-Jun-08 | 000360 | . | 1 |
30-Sep-08 | 000360 | . | 1 |
31-Dec-08 | 00360 | 31-Dec-08 | 1 |
31-Mar-09 | 00360 | . | 1 |
30-Jun-09 | 00360 | . | 0 |
30-Sep-09 | 00360 | . | 1 |
31-Dec-09 | 00360 | 31-Dec-09 | 1 |
31-Mar-10 | 00360 | . | 0 |
30-Jun-10 | 00360 | . | 0 |
30-Sep-10 | 00360 | . | 0 |
31-Dec-10 | 00360 | 31-Dec-10 | 1 |
31-Mar-11 | 00360 | . | 0 |
30-Jun-11 | 00360 | . | 0 |
30-Sep-11 | 00360 | . | 0 |
31-Dec-11 | 00360 | . | 0 |
FPEDATS of 31MAR2003, 30JUN2003, 30SEP2003 and 31DEC2003 are 4 quarters within the same fiscal year-end of 31DEC2003 (datadate). So, the FREQ = 0+0+0+1 = 1; PERCENT_FREQ = 1/4 = 0.25.
The output for your suggested program - all the observations have the same financial year-end which is incorrect. Some firms might not have the same fiscal year-end of 31DEC.
If firms with year-end of 30SEP2004, then 31DEC2003, 31MAR2004, 30JUN2004 and 30SEP2004 should be within the same datadate of 30SEP2004.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.