Dear all,
I am cleaning the data and after merge I have found that I have two different values of the same variable over the same time period. It happened since the company switched its fiscal years. HOw can I get around this problem?
X_variable is reported from Compustat based on Fiscal Periods. DATADATE = end of fiscal period. FYR = month(DATADATE). BEGFYR = beginning of fiscal period.
As_of_Date is taken from another dataset and it seems that it falls into two different fiscal periods because this company (ID = 123456) changed its reporting date from December (FYR = 12) until september (FYR=09) and then used FYR = 09 until today. So this swtich happened once. What should I do in this case? If the company switched from 12 (up until 2009) to 9(from 2010), how can I say that for the
For 2009 we have a long year (from 2009-01-01 until 2009-12-31) So if As_of_Date falls between these two dates, we report X_variable = 472
For 2010 we will have a short year (from 2010-01-01 until 2010-09-30). So As_of_Date won't fall between these two dates, so we will omit the duplicate values. So that the fiscal year periods will look like:
2009-01-01 until 2009-12-31 (12 months)
2010-01-01 until 2010-09-30 (10 months)
2010-10-01 until 2011-09-30 (12 months)
I think I need to change smth in the code in the definition of fiscal years:
data sql.NEW;
set sql. OLD;
format begfyr endfyr yymmdd10.;
fyr = month(DATADATE);
endfyr = DATADATE;
begfyr = intnx('month', endfyr, -11, 'beg');
run;
proc sql;
create table sql.merge
as select *
from sql. loan as a, sql.compustat as b
where a.ID = b.ID and
a.As_of_Date >= b.begfyr and a.As_of_Date <= b.endfyr
;
quit;
Original DATA
ID As_of_Date DATADATE FYR BEGFYR X_variable
123456 2009-10-30 2009-12-31 12 2009-01-01 472
123456 2009-11-30 2009-12-31 12 2009-01-01 472
123456 2009-12-31 2009-12-31 12 2009-01-01 472
123456 2009-10-30 2010-09-30 09 2009-10-01 434
123456 2009-11-30 2010-09-30 09 2009-10-01 434
123456 2009-12-31 2010-09-30 09 2009-10-01 434
It is not clear to me:
1) is "Original data" your input ?
2) what output do you expect ? can you figure it?
what differences are relating to your code output ?
Thank you for your questions. Let me clarify my question.
I have to correctly define the variable "BEGFYR" = beginning of fiscal year. At the moment in the original data we have DATADATE = ENDFYR = end of fiscal year. If 12-31-2000 is the ENDFYR, then 01-01-2000 will be BEGFYR. But sometimes companies switch fiscal years. At first they may report in December but then start reporting in September or any other months. So I have to take care of this situation.
For example, in the original data I have ENDFYR and FYR (=month(DATADATE)). I have to construct BEGFYR (but need to take into account that some fiscal years may be short and some years may be missing). Also, this changes from one firm to another firm.
ID ENDFYR FYR BEGFYR
123456 12-31-2000 12 01-01-2000
123456 12-31-2001 12 01-01-2001
123456 9-30-2002 9 01-01-2002 (short year)
123456 9-30-2003 9 10-01-2002
123456 9-30-2006 9 10-01-2005
123456 7-31-2008 7 08-01-2007
654321 12-31-2000 12 01-01-2000
654321 5-31-2001 5 01-01-2001 (short year)
654321 8-31-2001 8 06-01-2001 (short year)
So how should I define BEGFYR?
Thank you for your help.
Yelena
Assuming records are sorted by ID ENDFYR and the day after is the BEGFYR of next record
then you can calculate BEGFYR fot the 2nd record on:
data want;
set have;
by id;
retain prev_endfyr;
if first.id then do;
BEGFYR = ... ; /* make the best you can to define it on first ID record */
prev_begfyr = endfyr;
end;
else begfyr = prev_begfyr +1;
drop prev_endfyr;
run;
So you are independent of FYR.
Thank you very much! Should I then define:
begfyr = intnx('month', endfyr, -fyr-1, 'beg');
The best way to check a code is to run it on few recods and check results;
data test;
set want(obs=10);
.... enter your code to test ....
run;
Hello,
I have tried to run the code but it does not work. I only get the observations for two observations:
ID ENDFYR FYR begfyr prev_begfyr
123456 2000-12-31 12 2000-01-01 2000-12-31
123456 2001-12-31 12 . .
123456 2002-09-30 9 . .
123456 2003-09-30 9 . .
123456 2006-09-30 9 . .
123456 2008-07-31 7 . .
654321 2000-12-31 12 2000-01-01 2000-12-31
654321 2001-05-31 5 . .
I am not sure how to populate missing values. Any suggestions? Thank you for your help.
Sorry, I have missed something - each line is a new period - then try:
data want;
set have;
by id;
retain prev_endfyr;
if first.id then do;
BEGFYR = ... ; /* make the best you can to define it on first ID record */
prev_begfyr = endfyr;
end;
begfyr = prev_begfyr +1; /* else - canceled */
prev_begfyr = endfyr; /* line added */
drop prev_endfyr;
run;
Hmm, thank you so much for your help. But still something is weird with the code as there are only first observations that are defined for begfyr.
I have figured out how to write the code. It works now. I'll post it later.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.