Hi:
So if your data were like this (called QTR1_2009):
[pre]
forecast_
company year quarter analyst forecast_date forecast
XXX 2009 1 Anna 01/01/2009 67.9765 <---want
XXX 2009 1 Anna 02/01/2009 14.1351
XXX 2009 1 Anna 02/15/2009 14.8816
XXX 2009 1 Anna 03/01/2009 76.3922
XXX 2009 1 Bill 01/15/2009 43.1884 <---want
XXX 2009 1 Bill 01/29/2009 55.3036
XXX 2009 1 Bill 02/15/2009 69.1096
XXX 2009 1 Bill 03/29/2009 10.7228
XXX 2009 1 Bill 04/12/2009 22.8308
YYY 2009 1 Anna 01/01/2009 42.0122 <---want
YYY 2009 1 Anna 01/15/2009 38.5071
YYY 2009 1 Anna 02/01/2009 41.7154
YYY 2009 1 Anna 02/15/2009 47.9263
YYY 2009 1 Anna 03/15/2009 54.1295
YYY 2009 1 Bill 01/15/2009 49.7825 <---want
YYY 2009 1 Bill 01/29/2009 54.8730
YYY 2009 1 Bill 03/01/2009 33.8824
YYY 2009 1 Bill 03/15/2009 47.3228
ZZZ 2009 1 Anna 01/15/2009 65.1236 <---want
ZZZ 2009 1 Anna 02/15/2009 45.2000
ZZZ 2009 1 Anna 03/01/2009 56.9731
ZZZ 2009 1 Anna 03/15/2009 68.1335
ZZZ 2009 1 Bill 02/01/2009 24.5254 <---want
ZZZ 2009 1 Bill 03/15/2009 42.1966
ZZZ 2009 1 Bill 03/29/2009 32.3020
ZZZ 2009 1 Bill 04/12/2009 39.5386
[/pre]
Then you would only want to keep the 6 highlighted observations, which happen to occur FIRST when the data are sorted by COMPANY, YEAR, QUARTER and ANALYST (and also probably DATE).
This is a job for BY variables, BY-group processing and the fact that a DATA step program can detect whether an observation is the first in a group (by creating some automatic variables) and so, your program can test the automatic variables.
So, for example, for the above data, for 3 companies, for 1 quarter, for 2 analysts, look at the values for the variables on the right, which have values of 0 or 1:
[pre]
first_ first_ first_ first_
forecast_ forecast_ byvar_ byvar_ byvar_ byvar_
company year quarter analyst date forecast company year qtr analyst
XXX 2009 1 Anna 01/01/2009 67.9765 1 1 1 1
XXX 2009 1 Anna 02/01/2009 14.1351 0 0 0 0
XXX 2009 1 Anna 02/15/2009 14.8816 0 0 0 0
XXX 2009 1 Anna 03/01/2009 76.3922 0 0 0 0
XXX 2009 1 Bill 01/15/2009 43.1884 0 0 0 1
XXX 2009 1 Bill 01/29/2009 55.3036 0 0 0 0
XXX 2009 1 Bill 02/15/2009 69.1096 0 0 0 0
XXX 2009 1 Bill 03/29/2009 10.7228 0 0 0 0
XXX 2009 1 Bill 04/12/2009 22.8308 0 0 0 0
YYY 2009 1 Anna 01/01/2009 42.0122 1 1 1 1
YYY 2009 1 Anna 01/15/2009 38.5071 0 0 0 0
YYY 2009 1 Anna 02/01/2009 41.7154 0 0 0 0
YYY 2009 1 Anna 02/15/2009 47.9263 0 0 0 0
YYY 2009 1 Anna 03/15/2009 54.1295 0 0 0 0
YYY 2009 1 Bill 01/15/2009 49.7825 0 0 0 1
YYY 2009 1 Bill 01/29/2009 54.8730 0 0 0 0
YYY 2009 1 Bill 03/01/2009 33.8824 0 0 0 0
YYY 2009 1 Bill 03/15/2009 47.3228 0 0 0 0
ZZZ 2009 1 Anna 01/15/2009 65.1236 1 1 1 1
ZZZ 2009 1 Anna 02/15/2009 45.2000 0 0 0 0
ZZZ 2009 1 Anna 03/01/2009 56.9731 0 0 0 0
ZZZ 2009 1 Anna 03/15/2009 68.1335 0 0 0 0
ZZZ 2009 1 Bill 02/01/2009 24.5254 0 0 0 1
ZZZ 2009 1 Bill 03/15/2009 42.1966 0 0 0 0
ZZZ 2009 1 Bill 03/29/2009 32.3020 0 0 0 0
ZZZ 2009 1 Bill 04/12/2009 39.5386 0 0 0 0
[/pre]
The variables were created by capturing the "first.byvar" values into a variable that could be displayed in PROC PRINT -- in the following manner:
first_byvar_company was created from first.company
first_byvar_year was created from first.year
first_byvar_qtr was created from first.forecast_quarter
first_byvar_analyst was created from first.analyst
The first observation for each analyst -- the ones you want -- will be able to be discovered by testing whether FIRST.ANALYST is equal to 1 and, if it is, then written to a new dataset with a program like:
[pre]
proc sort data=qtr1_2009 out=qtr1_2009;
by company year forecast_quarter analyst forecast_date;
run;
data keepfirst;
set qtr1_2009;
by company year forecast_quarter analyst;
if first.analyst then output keepfirst;
run;
[/pre]
(when the data are sorted and the correct by variables are specified). Because of the way the BY variables were sorted, every time COMPANY changes, all the nested FIRST.byvar values are reset -- because if it is the FIRST.COMPANY, it is also the FIRST.YEAR and FIRST.ANALYST etc for that company.
For a more thorough explanation, you will want to read through the ENTIRE topic entitled "BY-Group Processing in the DATA Step"
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001283274.htm
The program that produced the outputs above are shown below.
cynthia
[pre]
** make some data;
data qtr1_2009;
infile datalines dlm=',' dsd;
input company $ year forecast_quarter analyst $ forecast_date : mmddyy10. forecast;
return;
datalines;
XXX,2009,1,Anna,01/01/2009,67.9765
XXX,2009,1,Anna,02/01/2009,14.1351
XXX,2009,1,Anna,02/15/2009,14.8816
XXX,2009,1,Anna,03/01/2009,76.3922
XXX,2009,1,Bill,01/15/2009,43.1884
XXX,2009,1,Bill,01/29/2009,55.3036
XXX,2009,1,Bill,02/15/2009,69.1096
XXX,2009,1,Bill,03/29/2009,10.7228
XXX,2009,1,Bill,04/12/2009,22.8308
YYY,2009,1,Anna,01/01/2009,42.0122
YYY,2009,1,Anna,01/15/2009,38.5071
YYY,2009,1,Anna,02/01/2009,41.7154
YYY,2009,1,Anna,02/15/2009,47.9263
YYY,2009,1,Anna,03/15/2009,54.1295
YYY,2009,1,Bill,01/15/2009,49.7825
YYY,2009,1,Bill,01/29/2009,54.8730
YYY,2009,1,Bill,03/01/2009,33.8824
YYY,2009,1,Bill,03/15/2009,47.3228
ZZZ,2009,1,Anna,01/15/2009,65.1236
ZZZ,2009,1,Anna,02/15/2009,45.2000
ZZZ,2009,1,Anna,03/01/2009,56.9731
ZZZ,2009,1,Anna,03/15/2009,68.1335
ZZZ,2009,1,Bill,02/01/2009,24.5254
ZZZ,2009,1,Bill,03/15/2009,42.1966
ZZZ,2009,1,Bill,03/29/2009,32.3020
ZZZ,2009,1,Bill,04/12/2009,39.5386
;
run;
** Sort the data just to be sure that it is in the correct order;
proc sort data=qtr1_2009 out=qtr1_2009;
by company year forecast_quarter analyst forecast_date;
run;
** This dataset is just to show how FIRST.byvar values are created automatically;
** And their values are captured into variables for display by PROC PRINT;
data showall;
set qtr1_2009;
by company year forecast_quarter analyst;
first_byvar_company = first.company;
first_byvar_year = first.year;
first_byvar_qtr = first.forecast_quarter;
first_byvar_analyst = first.analyst;
output showall;
run;
** Show values of all BY vars automatic variables, as captured for PRINT.;
proc print data=showall noobs;
title 'showall -- look at values created by using FIRST.byvar';
var company year forecast_quarter analyst forecast_date forecast
first_byvar_company first_byvar_year first_byvar_qtr first_byvar_analyst ;
format forecast_date mmddyy10.;
run;
** Now get only the observations of interest using FIRST.ANALYST to control output.;
data keepfirst;
set qtr1_2009;
by company year forecast_quarter analyst;
if first.analyst then output keepfirst;
run;
** Display the observations in the new dataset. QTR1_2009 still has the original group of observations.;
** But WORK.KEEPFIRST is what you would use going forward for more analysis.;
proc print data=keepfirst noobs;
title 'keepfirst';
var company year forecast_quarter analyst forecast_date forecast ;
format forecast_date mmddyy10.;
run;
[/pre]