DATA Step, Macro, Functions and more

Select most recent row with value

Accepted Solution Solved
Reply
Regular Contributor
Posts: 162
Accepted Solution

Select most recent row with value

Dear all,

 

I have a dataset with following variables:

 

Gvkey = company identification code

fyearq= fiscal year

fqtr=fiscal quarter

sale_dec= indicator value = 1 if sales decrease; else it is equals to 0

saledec= value of decrease

saleinc = value of increase

 

I would like to find:

1. the most recent quarter with a sales decrease and its valueof decrease

2. the most recent quarter with a sales increase and its value of increase

 

for each fiscal year. The selection of the most recent quarter is based on the last 7 quarters (q-7) plus the current quarter (q).

 

Thank you and hope to receive a solution soon.

 

mspak

Attachment

Accepted Solutions
Solution
‎03-26-2017 10:03 AM
Super User
Posts: 10,023

Re: Select most recent row with value

OK. Mspak.

So you want a rolling window which'length is eight quarter and output one obs for each id and year ?

Assuming there is no gap(missing quarter in a year) in your data. Otherwise you need some more code to handle this.

 




data want;
  set test;
  by GVKEY  FYEARQ;
  array d{8} _temporary_;
  array i{8} _temporary_;
  if first.gvkey then call missing(of d{*},of i{*});
  
  do k=8 to 2 by -1;
   d{k}=d{k-1};
   i{k}=i{k-1};
  end;
  
  d{1}=SALEDEC;
  i{1}=SALEINC;

  if last.fyearq then do; 
    last_decrease=coalesce(of d{*});
    last_increase=coalesce(of i{*});
    output;
  end;
 keep GVKEY  FYEARQ last_: ;
run;

 

 

View solution in original post


All Replies
Respected Advisor
Posts: 4,173

Re: Select most recent row with value

Below should point you into the right direction.

libname temp 'C:\temp';

proc sort data=temp.test out=inter;
  where sale_dec=1;
  by gvkey fyearq fqtr;
run;

/*the most recent quarter with a sales decrease and its valueof decrease per company identification code*/
data want1;
  set inter;
  by gvkey fyearq fqtr;
  if last.gvkey;
run;
Super User
Posts: 10,023

Re: Select most recent row with value

@mspak ,

plz post your data at this forum, not XLS, ZIP file.

No one would like to download these files from website due to virus risk.

And do't forget to post the output you want see too.

 

It is hard to understand what you mean by most recent quarter.

Assuming I understand what you mean.

 

 

data last_decrease last_increase;
 do i=1 by 1 until(last.gvkey);
  set test;
  by gvkey;
  if sale_dec then idx_dec=i;
  if not missing(saleinc) then idx_inc=i;
 end;
 
 do j=1 by 1 until(last.gvkey);
  set test;
  by gvkey;
  if j=idx_dec then output last_decrease;
  if j=idx_inc then output last_increase;
 end;
 drop i j;
run;
Regular Contributor
Posts: 162

Re: Select most recent row with value

Hi,

 

The dataset is as follows:

GVKEY  FYEARQ   FQTR       DATACQTR    SALE_DEC     SALEDEC                   SALEINC

001004  2011          1              2011Q3                1               4.0787026566                    .
001004  2011          2              2011Q4                1               .                                          .
001004  2011         3               2012Q1                0               .                                  -0.017301069
001004  2011         4               2012Q2                0               .                                  0.2664100989
001004  2012         1               2012Q3               1                                                  1.8067205327 .
001004  2012         2               2012Q4               1                .                                     .
001004  2012         3               2013Q1               0                .                                  1.9264163544
001004  2012         4               2013Q2               0                .                                  -0.188650692

 

With the above dataset, I would like to find the most recent quarter's sales decrease and most recent quarter's sales increase. 

 

 

For example, the most recent quarter with a sales decrease for year 2011 and 2012 would be 4.01787...as this is the only sales decrease over the last 8 quarters. 

 

The most recent sales increase for year 2011 would be 0.2664100989 as this is the latest value for sales increase over the last 8 quarters for year 2011. The value for the latest sales increase for 2012 would be -0.18865. 

 

The most recent "sales decrease" event not neccessarily happen in the quarter 4 as the firm might experience the last "sales decrease" event during quarter 1 of previous year. 

 

I wish the output as follows:

GVKEY  FYEARQ     SALEDEC            SALEINC

001004   2011          4.0787026566       0.2664100989

001004   2012          4.0787026566        -0.188650692

 

I hope that this make it clearer. 

 

Thank you for any suggestions. 

 

Regards,

MSPAK

Solution
‎03-26-2017 10:03 AM
Super User
Posts: 10,023

Re: Select most recent row with value

OK. Mspak.

So you want a rolling window which'length is eight quarter and output one obs for each id and year ?

Assuming there is no gap(missing quarter in a year) in your data. Otherwise you need some more code to handle this.

 




data want;
  set test;
  by GVKEY  FYEARQ;
  array d{8} _temporary_;
  array i{8} _temporary_;
  if first.gvkey then call missing(of d{*},of i{*});
  
  do k=8 to 2 by -1;
   d{k}=d{k-1};
   i{k}=i{k-1};
  end;
  
  d{1}=SALEDEC;
  i{1}=SALEINC;

  if last.fyearq then do; 
    last_decrease=coalesce(of d{*});
    last_increase=coalesce(of i{*});
    output;
  end;
 keep GVKEY  FYEARQ last_: ;
run;

 

 

Regular Contributor
Posts: 162

Re: Select most recent row with value

Thank you Ksharp for helps. This code is really useful. Have a nice day!

Super Contributor
Posts: 441

Re: Select most recent row with value

Hi Ksharp,

 

I have some questions regarding the understanding of your code.

 

when you write:

 

 if first.gvkey then call missing(of d{*},of i{*});
  
  do k=8 to 2 by -1;
   d{k}=d{k-1};
   i{k}=i{k-1};
  end;

From what I understand, whenever we encounter the first observation of a certain gvkey, you make all the values of the arrays to be missing, and then you make them all be the same - but they are already all missing in the first place?

Super User
Posts: 10,023

Re: Select most recent row with value

Yes. they are all missing.
It doesn't matter. 
it is just one more useless operator for the first.gvkey.
it wouldn't change anything about result .


Super Contributor
Posts: 441

Re: Select most recent row with value

Ok, but now I have another question!Smiley Very HappySmiley Tongue

 

I realize that you are doing rolling windows, but can't understand it from the code.

 

So for a given observation, d{1} is the saledec for that period, and d{2} is the saledec for the previous perid etc.

 

But how do you put the saledec of the previous period into d{2}? You don't do something like d{2} = lag{saledec}, so how is it done?

 

Thanks!

Super User
Posts: 10,023

Re: Select most recent row with value

[ Edited ]

Yes. I do.

The array D is temporary, which means it is retained during all the data step time.

so when do 

 

 do k=8 to 2 by -1;
   d{k}=d{k-1};
   i{k}=i{k-1};
  end;

assume  d{8}=1 2 3 4 5 6 7 8

then the first time d{8}=current_obs 1 2 3 4 5 6 7

 NOTE, 8 is removed from d{*}.

 

next time when run

d{8}=next_obs current_obs 1 2 3 4 5 6

 NOTE, 7 is removed from d{*}.

Super Contributor
Posts: 441

Re: Select most recent row with value

Hi mspak,

 

Very nice question, I am also working a lot with financial data.

 

 

I looked at your data and noticed the following:

 

1) saleinc has both positive and negative values - shouldn't they all be of the same sign?

2) there are observations when the sale_dec is either 0 or 1, but there is no value for the saledec or saleinc for these observations -is it because this information is missing in the original database? 

 

Thanks

Regular Contributor
Posts: 162

Re: Select most recent row with value

The sale_dec is an indicator or dummy value, therefore it is either 1 0r 0.
The saledec or saleinc are logarithms transformation value, therefore, they
include both positive and negative values.

##- Please type your reply above this line. Simple formatting, no
attachments. -##
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 278 views
  • 4 likes
  • 4 in conversation