Help using Base SAS procedures

Only keep first observation of quarter

Reply
Occasional Contributor
Posts: 7

Only keep first observation of quarter

Hi,

I'm having troubles finding a solution to my problem. I'm having a dataset with earnings forecasts from a lot of financial analysts for different companies in different quarters. I should be able to filter revisions that are made during one quarter and thus only keep their first forecast.

What I already did is creating two new variables (quarter and year). But I don't know how to implement the filtering.

What I know:
It should delete all observations with forecast date after the first forecast within a given quarter and year. This is the reason I created the two new variables.
This filter should be applied to every company and every analyst.

I know it's something like this, but I don't know how to translate it into SAS.
I thought about creating a counter that would count whether it was the first, second or third forecast in a given quarter and then use a delete statement: if .. > 1 then delete.

Your help would be much appreciated!
Thanks in advance!
SAS Super FREQ
Posts: 8,868

Re: Only keep first observation of quarter

Hi:
I always find it easier to visualize the data and express the question and the solution in terms of something specific. So, for this specific data:
[pre]
forecast_ forecast_
company year quarter analyst date forecast
XXX 2009 1 Anna 01/01/2009 67.9765
XXX 2009 1 Anna 02/01/2009 14.1351
XXX 2009 1 Anna 02/15/2009 14.8816
XXX 2009 1 Anna 03/01/2009 76.3922
XXX 2009 1 Bill 01/15/2009 43.1884
XXX 2009 1 Bill 01/29/2009 55.3036
XXX 2009 1 Bill 02/15/2009 69.1096
XXX 2009 1 Bill 03/29/2009 10.7228
XXX 2009 1 Bill 04/12/2009 22.8308
[/pre]

for Company XXX in 2009, Quarter 1, Anna has made 4 forecasts and Bill has made 5 forecasts. Of these forecasts, which ones do you want to KEEP??? From your description, I think you want to keep Anna's 01/01/2009 forecast of 67.9765 and Bill's 01/15/2009 forecast of 43.1884...but I'm not sure.

Can you elaborate using the above data as an example?? If your data does not look like the above data, how does the structure of your data differ from the sample data???

cynthia
Occasional Contributor
Posts: 7

Re: Only keep first observation of quarter

Posted in reply to Cynthia_sas
Thanks!

My data indeed looks like your example, the only difference is that there is also a company YYY, ZZZ etc.
And indeed, i want to keep anna's 01/01 and bill's 01/15 forecasts.
Sorry if my question wasn't clear but english is not my native language.
SAS Super FREQ
Posts: 8,868

Re: Only keep first observation of quarter

Hi:
So if your data were like this (called QTR1_2009):
[pre]
forecast_
company year quarter analyst forecast_date forecast
XXX 2009 1 Anna 01/01/2009 67.9765 <---want
XXX 2009 1 Anna 02/01/2009 14.1351
XXX 2009 1 Anna 02/15/2009 14.8816
XXX 2009 1 Anna 03/01/2009 76.3922
XXX 2009 1 Bill 01/15/2009 43.1884 <---want
XXX 2009 1 Bill 01/29/2009 55.3036
XXX 2009 1 Bill 02/15/2009 69.1096
XXX 2009 1 Bill 03/29/2009 10.7228
XXX 2009 1 Bill 04/12/2009 22.8308
YYY 2009 1 Anna 01/01/2009 42.0122 <---want
YYY 2009 1 Anna 01/15/2009 38.5071
YYY 2009 1 Anna 02/01/2009 41.7154
YYY 2009 1 Anna 02/15/2009 47.9263
YYY 2009 1 Anna 03/15/2009 54.1295
YYY 2009 1 Bill 01/15/2009 49.7825 <---want
YYY 2009 1 Bill 01/29/2009 54.8730
YYY 2009 1 Bill 03/01/2009 33.8824
YYY 2009 1 Bill 03/15/2009 47.3228
ZZZ 2009 1 Anna 01/15/2009 65.1236 <---want
ZZZ 2009 1 Anna 02/15/2009 45.2000
ZZZ 2009 1 Anna 03/01/2009 56.9731
ZZZ 2009 1 Anna 03/15/2009 68.1335
ZZZ 2009 1 Bill 02/01/2009 24.5254 <---want
ZZZ 2009 1 Bill 03/15/2009 42.1966
ZZZ 2009 1 Bill 03/29/2009 32.3020
ZZZ 2009 1 Bill 04/12/2009 39.5386
[/pre]

Then you would only want to keep the 6 highlighted observations, which happen to occur FIRST when the data are sorted by COMPANY, YEAR, QUARTER and ANALYST (and also probably DATE).

This is a job for BY variables, BY-group processing and the fact that a DATA step program can detect whether an observation is the first in a group (by creating some automatic variables) and so, your program can test the automatic variables.

So, for example, for the above data, for 3 companies, for 1 quarter, for 2 analysts, look at the values for the variables on the right, which have values of 0 or 1:
[pre]
first_ first_ first_ first_
forecast_ forecast_ byvar_ byvar_ byvar_ byvar_
company year quarter analyst date forecast company year qtr analyst
XXX 2009 1 Anna 01/01/2009 67.9765 1 1 1 1
XXX 2009 1 Anna 02/01/2009 14.1351 0 0 0 0
XXX 2009 1 Anna 02/15/2009 14.8816 0 0 0 0
XXX 2009 1 Anna 03/01/2009 76.3922 0 0 0 0
XXX 2009 1 Bill 01/15/2009 43.1884 0 0 0 1
XXX 2009 1 Bill 01/29/2009 55.3036 0 0 0 0
XXX 2009 1 Bill 02/15/2009 69.1096 0 0 0 0
XXX 2009 1 Bill 03/29/2009 10.7228 0 0 0 0
XXX 2009 1 Bill 04/12/2009 22.8308 0 0 0 0
YYY 2009 1 Anna 01/01/2009 42.0122 1 1 1 1
YYY 2009 1 Anna 01/15/2009 38.5071 0 0 0 0
YYY 2009 1 Anna 02/01/2009 41.7154 0 0 0 0
YYY 2009 1 Anna 02/15/2009 47.9263 0 0 0 0
YYY 2009 1 Anna 03/15/2009 54.1295 0 0 0 0
YYY 2009 1 Bill 01/15/2009 49.7825 0 0 0 1
YYY 2009 1 Bill 01/29/2009 54.8730 0 0 0 0
YYY 2009 1 Bill 03/01/2009 33.8824 0 0 0 0
YYY 2009 1 Bill 03/15/2009 47.3228 0 0 0 0
ZZZ 2009 1 Anna 01/15/2009 65.1236 1 1 1 1
ZZZ 2009 1 Anna 02/15/2009 45.2000 0 0 0 0
ZZZ 2009 1 Anna 03/01/2009 56.9731 0 0 0 0
ZZZ 2009 1 Anna 03/15/2009 68.1335 0 0 0 0
ZZZ 2009 1 Bill 02/01/2009 24.5254 0 0 0 1
ZZZ 2009 1 Bill 03/15/2009 42.1966 0 0 0 0
ZZZ 2009 1 Bill 03/29/2009 32.3020 0 0 0 0
ZZZ 2009 1 Bill 04/12/2009 39.5386 0 0 0 0
[/pre]
The variables were created by capturing the "first.byvar" values into a variable that could be displayed in PROC PRINT -- in the following manner:
first_byvar_company was created from first.company
first_byvar_year was created from first.year
first_byvar_qtr was created from first.forecast_quarter
first_byvar_analyst was created from first.analyst

The first observation for each analyst -- the ones you want -- will be able to be discovered by testing whether FIRST.ANALYST is equal to 1 and, if it is, then written to a new dataset with a program like:
[pre]
proc sort data=qtr1_2009 out=qtr1_2009;
by company year forecast_quarter analyst forecast_date;
run;

data keepfirst;
set qtr1_2009;
by company year forecast_quarter analyst;
if first.analyst then output keepfirst;
run;
[/pre]

(when the data are sorted and the correct by variables are specified). Because of the way the BY variables were sorted, every time COMPANY changes, all the nested FIRST.byvar values are reset -- because if it is the FIRST.COMPANY, it is also the FIRST.YEAR and FIRST.ANALYST etc for that company.

For a more thorough explanation, you will want to read through the ENTIRE topic entitled "BY-Group Processing in the DATA Step"
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001283274.htm

The program that produced the outputs above are shown below.

cynthia
[pre]
** make some data;
data qtr1_2009;
infile datalines dlm=',' dsd;
input company $ year forecast_quarter analyst $ forecast_date : mmddyy10. forecast;
return;
datalines;
XXX,2009,1,Anna,01/01/2009,67.9765
XXX,2009,1,Anna,02/01/2009,14.1351
XXX,2009,1,Anna,02/15/2009,14.8816
XXX,2009,1,Anna,03/01/2009,76.3922
XXX,2009,1,Bill,01/15/2009,43.1884
XXX,2009,1,Bill,01/29/2009,55.3036
XXX,2009,1,Bill,02/15/2009,69.1096
XXX,2009,1,Bill,03/29/2009,10.7228
XXX,2009,1,Bill,04/12/2009,22.8308
YYY,2009,1,Anna,01/01/2009,42.0122
YYY,2009,1,Anna,01/15/2009,38.5071
YYY,2009,1,Anna,02/01/2009,41.7154
YYY,2009,1,Anna,02/15/2009,47.9263
YYY,2009,1,Anna,03/15/2009,54.1295
YYY,2009,1,Bill,01/15/2009,49.7825
YYY,2009,1,Bill,01/29/2009,54.8730
YYY,2009,1,Bill,03/01/2009,33.8824
YYY,2009,1,Bill,03/15/2009,47.3228
ZZZ,2009,1,Anna,01/15/2009,65.1236
ZZZ,2009,1,Anna,02/15/2009,45.2000
ZZZ,2009,1,Anna,03/01/2009,56.9731
ZZZ,2009,1,Anna,03/15/2009,68.1335
ZZZ,2009,1,Bill,02/01/2009,24.5254
ZZZ,2009,1,Bill,03/15/2009,42.1966
ZZZ,2009,1,Bill,03/29/2009,32.3020
ZZZ,2009,1,Bill,04/12/2009,39.5386
;
run;

** Sort the data just to be sure that it is in the correct order;
proc sort data=qtr1_2009 out=qtr1_2009;
by company year forecast_quarter analyst forecast_date;
run;

** This dataset is just to show how FIRST.byvar values are created automatically;
** And their values are captured into variables for display by PROC PRINT;
data showall;
set qtr1_2009;
by company year forecast_quarter analyst;
first_byvar_company = first.company;
first_byvar_year = first.year;
first_byvar_qtr = first.forecast_quarter;
first_byvar_analyst = first.analyst;
output showall;
run;

** Show values of all BY vars automatic variables, as captured for PRINT.;
proc print data=showall noobs;
title 'showall -- look at values created by using FIRST.byvar';
var company year forecast_quarter analyst forecast_date forecast
first_byvar_company first_byvar_year first_byvar_qtr first_byvar_analyst ;
format forecast_date mmddyy10.;
run;

** Now get only the observations of interest using FIRST.ANALYST to control output.;
data keepfirst;
set qtr1_2009;
by company year forecast_quarter analyst;
if first.analyst then output keepfirst;
run;

** Display the observations in the new dataset. QTR1_2009 still has the original group of observations.;
** But WORK.KEEPFIRST is what you would use going forward for more analysis.;
proc print data=keepfirst noobs;
title 'keepfirst';
var company year forecast_quarter analyst forecast_date forecast ;
format forecast_date mmddyy10.;
run;
[/pre]
Regular Contributor
Posts: 229

Re: Only keep first observation of quarter

Posted in reply to Cynthia_sas
proc sort data=a;
by analyst company quarter;
run;


data a2;
set a;
if first.company then a=1;
if first.quarter then b=1;
if first.analyst then c=1;
by analyst company quarter;
if a=1 and b=1 then output;
run;
Occasional Contributor
Posts: 7

Re: Only keep first observation of quarter

Thanks a lot!
Ask a Question
Discussion stats
  • 5 replies
  • 3201 views
  • 0 likes
  • 3 in conversation