Keeping only contiguous Date Data

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 101
Accepted Solution

Keeping only contiguous Date Data

Hello, it's John again.

I have a large data set with data from many different financial quarters for many different firms.  I want to keep only data for firms which have data from contiguous quarters, meaning if I have data from the second quarter of 2010 for Firm A, I only want to keep the data if I also have either 1st quarter or 3rd quarter of 2010 for Firm A as well.  One wrinkle is that the dates are WITHIN the quarter, and not always the same,  so I am going to have to assign quarter values for each date before I can do whatever sort I need to do.  The data takes the following form:

Firm                                   Date

A                              30AUG2010

B                              07SEP2010

C                              07SEP2010

D                              07SEP2010

...

...

A                             15DEC2010                                      

B                              15DEC2010

D                              15DEC2010    

...

...

A                              5JAN2011

B                              5JAN2011

C                              5JAN2011    

D                              5JAN2011

...

...

So in this case, we would want to throw out the data point for Firm C for 3rd quarter 2010 (assuming there is no 2nd quarter 2010 data) becuase there is no 4th quarter 2010 data for Firm C, thus making the datapoints not continguous.

Any help would be greatly appreciated.

Thanks,

John


Accepted Solutions
Solution
‎02-09-2014 01:27 PM
Respected Advisor
Posts: 4,641

Re: Keeping only contiguous Date Data

From a direct translation of your requirements into a SQL query:

data have;

input Firm $ Date :date9.;

format date yymmdd10.;

datalines;

A                              30AUG2010

B                              07SEP2010

C                              07SEP2010

D                              07SEP2010

A                             15DEC2010                                     

B                              15DEC2010

D                              15DEC2010   

A                              5JAN2011

B                              5JAN2011

C                              5JAN2011   

D                              5JAN2011

;

proc sql;

create table contig as

select *, intnx("QTR", date, 0) as qtr format=yyq.

from have as a

where calculated qtr in (select intnx("QTR", date, -1) from have where firm=a.firm)

       or calculated qtr in (select intnx("QTR", date, 1) from have where firm=a.firm);

quit;

PG

PG

View solution in original post


All Replies
Solution
‎02-09-2014 01:27 PM
Respected Advisor
Posts: 4,641

Re: Keeping only contiguous Date Data

From a direct translation of your requirements into a SQL query:

data have;

input Firm $ Date :date9.;

format date yymmdd10.;

datalines;

A                              30AUG2010

B                              07SEP2010

C                              07SEP2010

D                              07SEP2010

A                             15DEC2010                                     

B                              15DEC2010

D                              15DEC2010   

A                              5JAN2011

B                              5JAN2011

C                              5JAN2011   

D                              5JAN2011

;

proc sql;

create table contig as

select *, intnx("QTR", date, 0) as qtr format=yyq.

from have as a

where calculated qtr in (select intnx("QTR", date, -1) from have where firm=a.firm)

       or calculated qtr in (select intnx("QTR", date, 1) from have where firm=a.firm);

quit;

PG

PG
Frequent Contributor
Posts: 101

Re: Keeping only contiguous Date Data

,

Thank you very much for your help.  I think that we are very close to an answer.

Just for my edification, one question, what is happening in the "where" statements at the end of the PROC SQL statement?

I am assuming that INTNX("QTR", 100830, 0) will return 103?  And are where statements saying that we will include only data that corresponds to 102 or 104?

More specifically, I have never seen the "where-in" structure that is being used here.  What is that structure saying?

Once I get over that hump, I think I should be good to go.

Thanks so much for your help!

John

Respected Advisor
Posts: 4,641

Re: Keeping only contiguous Date Data

Hi John,

intnx("QTR", date, 0) returns the first date of the quarter in which date is,

similarly, intnx("QTR", date, 1) returns the first date of the next quarter and,

you guessed it, intnx("QTR", date, -1) retuns the first date of the previous quarter.

In plain language, the WHERE clause says "If the current quarter is the previous quarter for some observation of the same firm OR if the current quarter is the next quarter for some observation of the same firm."

Hth

PG

PG
Frequent Contributor
Posts: 101

Re: Keeping only contiguous Date Data

Thank you so much!!  I am very grateful for your help!

As I continue to do research I am sure to run into more issues like this, and having the help of this knowledgable forum is a life saver!

Talk to you soon,

John

PROC Star
Posts: 7,356

Re: Keeping only contiguous Date Data

Methinks that you are more than close with PGStat's code.  Yes, his calculation of quarter will result in values like 101, 102, etc.  Then, his first where statement says to only select the record if there is either:

  a record in the dataset, for that same firm, that has a value of the previous quarter (i.e., for 101, 094)

  or

  a record in the dataset, for that same firm, that has a value of the next quarter (i.e., for 101, 102)


I, personally, can't think of an easier way to accomplish the task.



PROC Star
Posts: 7,356

Re: Keeping only contiguous Date Data

: Would you contact me, off-forum, at atabachneck@gmail.com ?

I'd appreciate it!  There are some non-forum-related questions I'd like to discuss with you.

Art

Frequent Contributor
Posts: 101

Re: Keeping only contiguous Date Data

,

I was wondering if there was a way to use the LAG function to detect if there are two variables present in consecutive periods, whether that be months or quarters.  In this case, we are worried about quarters, so if we had data that took the following form:

data have;

input Firm $ TNA Date :date9.;

format date yymmdd10.;

datalines;

A           150                             30AUG2010

B            100                       07SEP2010

C               35                       07SEP2010

D                .                        07SEP2010

A                 153                 15DEC2010                                    

B                      352             15DEC2010

D                    153               15DEC2010  

A                       351            5JAN2011

B                       534            5JAN2011

C                          235         5JAN2011  

D                           234        5JAN2011

A                    .                     01MAR2011

D                    142               03MAR2011

;


Here, there is either entire quarters missing, or there are situations where there is an entry for a firm but there is no data for the variable TNA in that row.  I want to use the lag function to keep only data in contiguous quarters that includes both date and TNA information.


I can't quite figure out how to use the if/then statements with the lag functions to make this work.  The difficulty that I am having is when the years change, the qtr goes from 1 to 4, but I want to keep that data.  I have started with something like this, but it isn't working:

proc sort data = have;

by firm;

run;

data have2;

set have;

qtr = qtr(date);

lagqtr = lag(qtr);

lagtna = lag(tna);

if qtr - lagqtr ne 1 then delete;

if lagtna = . then delete;

run;

proc print data = have2;

run;


, please feel free to chime in as well!!!


Thanks so much!


John

PROC Star
Posts: 7,356

Re: Keeping only contiguous Date Data

I still think it will be easiest to use PG's sql approach and just add the extra condition is the where clause.  e.g.:

proc sql;

  create table contig as

    select *, intnx("QTR", date, 0) as qtr format=yyq.

      from have as a

        where calculated qtr in (select intnx("QTR", date, -1) from have where firm=a.firm

         and not missing(tna) and not missing(a.tna))

         or calculated qtr in (select intnx("QTR", date, 1) from have where firm=a.firm

         and not missing(tna) and not missing(a.tna))

           order by firm

  ;

quit;

You can't just use lag in a data step as a record has to be compared with both earlier and later records.

Respected Advisor
Posts: 4,641

Re: Keeping only contiguous Date Data

If you insist on using a datastep, you could use this clunky method:

proc sort data=have; by firm date; run;

data seq(keep=firm keepDate rename=keepDate=date);

retain prevDate;

set have; by firm;

where TNA is not missing;

prevDate = lag(date);

if not first.firm then

  if intnx("QTR", date, 0) = intnx("QTR", prevDate, 1) then do;

  keepDate = prevDate;

  output;

  keepDate = date;

  output;

  end;

run;

data want;

merge have(in=inHave) seq(in=inSeq);

by firm date;

if first.date and inHave and inSeq;

run;

PG

PG
Frequent Contributor
Posts: 101

Re: Keeping only contiguous Date Data

and ,

Thank you, as always, for your help.  The reason that I was trying to get this to work using the LAG function was because I am worried about survivorship bias in my dataset, meaning that I will be deciding what data to keep by looking forward and then keeping that data contingent on knowing that it is still there next quarter.   But I think that any of these ways get around that. 

Thanks so much, I am really learning a lot.

John

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 521 views
  • 4 likes
  • 3 in conversation