DATA Step, Macro, Functions and more

Select the right rows based on condition

Reply
Contributor
Posts: 41

Select the right rows based on condition

Dear all,

I am cleaning the data and after merge I have found that I have two different values of the same variable over the same time period. It happened since the company switched its fiscal years. HOw can I get around this problem?

 

X_variable is reported from Compustat based on Fiscal Periods. DATADATE = end of fiscal period. FYR = month(DATADATE). BEGFYR = beginning of fiscal period.

 

As_of_Date is taken from another dataset and it seems that it falls into two different fiscal periods because this company (ID = 123456) changed its reporting date from December (FYR = 12) until september (FYR=09) and then used FYR = 09 until today. So this swtich happened once. What should I do in this case? If the company switched from 12 (up until 2009) to 9(from 2010), how can I say that for the

 

For 2009 we have a long year (from 2009-01-01 until 2009-12-31) So if As_of_Date falls between these two dates, we report X_variable = 472

 

For 2010 we will have a short year (from 2010-01-01 until 2010-09-30). So As_of_Date won't fall between these two dates, so we will omit the duplicate values. So that the fiscal year periods will look like:

 

2009-01-01 until 2009-12-31 (12 months)

2010-01-01 until 2010-09-30 (10 months)

2010-10-01 until 2011-09-30 (12 months)

 

I think I need to change smth in the code in the definition of  fiscal years:

 

data sql.NEW;

set sql. OLD;

format begfyr endfyr yymmdd10.;

fyr = month(DATADATE);

endfyr = DATADATE;

begfyr = intnx('month', endfyr, -11, 'beg');

 

run;

 

proc sql;

create table sql.merge

as select *

from sql. loan as a, sql.compustat as b

where a.ID = b.ID and

a.As_of_Date >= b.begfyr and a.As_of_Date <= b.endfyr

;

quit;

 

 

Original DATA

 

ID                      As_of_Date                         DATADATE             FYR         BEGFYR                           X_variable

123456                2009-10-30                       2009-12-31              12               2009-01-01                                 472

123456               2009-11-30                         2009-12-31             12               2009-01-01                                 472

123456               2009-12-31                         2009-12-31               12            2009-01-01                                 472

123456                2009-10-30                        2010-09-30                09          2009-10-01                                    434

123456                2009-11-30                        2010-09-30                09            2009-10-01                                   434

123456                 2009-12-31                       2010-09-30                09             2009-10-01                                  434

Trusted Advisor
Posts: 1,378

Re: Select the right rows based on condition

It is not clear to me:

1) is "Original data" your input ?

2) what output do you expect ? can you figure it?
    what differences are relating to your code output ?

 

Contributor
Posts: 41

Re: Select the right rows based on condition

Thank you for your questions. Let me clarify my question. 

 

I have to correctly define the variable "BEGFYR" = beginning of fiscal year. At the moment in the original data we have DATADATE = ENDFYR = end of fiscal year. If 12-31-2000 is the ENDFYR, then 01-01-2000 will be BEGFYR. But sometimes companies switch fiscal years. At first they may report in December but then start reporting in September or any other months. So I have to take care of this situation. 

 

For example, in the original data I have ENDFYR and FYR (=month(DATADATE)). I have to construct BEGFYR (but need to take into account that some fiscal years may be short and some years may be missing). Also, this changes from one firm to another firm. 

 

ID                              ENDFYR                 FYR                   BEGFYR

 123456                   12-31-2000              12                   01-01-2000

123456                    12-31-2001              12                   01-01-2001

123456                    9-30-2002                 9                    01-01-2002 (short year)

123456                    9-30-2003                 9                    10-01-2002

123456                    9-30-2006                 9                     10-01-2005

123456                    7-31-2008                 7                     08-01-2007  

 

654321                    12-31-2000            12                      01-01-2000

654321                     5-31-2001             5                         01-01-2001 (short year)

654321                     8-31-2001             8                          06-01-2001 (short year)

 

So how should I define BEGFYR?

Thank you for your help.

Yelena                                  

Trusted Advisor
Posts: 1,378

Re: Select the right rows based on condition

Assuming records are sorted by ID ENDFYR  and the day after is the BEGFYR of next record

then you can calculate BEGFYR fot the 2nd record on:

 

data want;

 set have;

  by id;

      retain prev_endfyr;

      if first.id then do;

         BEGFYR = ... ;  /* make the best you can to define it on first ID record */

         prev_begfyr = endfyr;

     end; 

     else begfyr = prev_begfyr +1;

     drop prev_endfyr;

run;

 

So you are independent of FYR.

Contributor
Posts: 41

Re: Select the right rows based on condition

Thank you very much! Should I then define:

begfyr = intnx('month', endfyr, -fyr-1, 'beg');

 

 

Trusted Advisor
Posts: 1,378

Re: Select the right rows based on condition

The best way to check a code is to run it on few recods and check results;

  data test;

    set want(obs=10);

      .... enter your code to test ....

 run;

Trusted Advisor
Posts: 1,378

Re: Select the right rows based on condition

change to set have(obs=10) or any amount to check
Contributor
Posts: 41

Re: Select the right rows based on condition

Hello,

I have tried to run the code but it does not work. I only get the observations for two observations:

 

ID                       ENDFYR                  FYR                      begfyr                                  prev_begfyr  

123456               2000-12-31              12                             2000-01-01                          2000-12-31

123456               2001-12-31              12                                      .                                        .

123456               2002-09-30                9                                      .                                        .

123456               2003-09-30                9                                         .                                     .

123456              2006-09-30               9                                        .                                        .

123456             2008-07-31                7                                       .                                        .

654321             2000-12-31                12                              2000-01-01                           2000-12-31

654321             2001-05-31                  5                               .                                                 .

 

I am not sure how to populate missing values. Any suggestions? Thank you for your help.

Trusted Advisor
Posts: 1,378

Re: Select the right rows based on condition

Sorry, I have missed something - each line is a new period - then try:

 

data want;

 set have;

  by id;

      retain prev_endfyr;

      if first.id then do;

         BEGFYR = ... ;  /* make the best you can to define it on first ID record */

         prev_begfyr = endfyr;

     end; 

    begfyr = prev_begfyr +1;    /* else - canceled */

    prev_begfyr = endfyr;          /* line added */

     drop prev_endfyr;

run;

 

Contributor
Posts: 41

Re: Select the right rows based on condition

Hmm, thank you so much for your help. But still something is weird with the code as there are only first observations that are defined for begfyr.

 

I have figured out how to write the code. It works now. I'll post it later.

 

 

 

 

Ask a Question
Discussion stats
  • 9 replies
  • 406 views
  • 0 likes
  • 2 in conversation