Hi:
It seems that someone has told you about FIRST. and LAST. variables in DATA step processing and that would be one way to get the output dataset you need. In my opinion, I think that a DATA step program is the hardest way to get what you want -- when there are multiple procedure methods to accomplish the same thing.
However, before you go down the DATA step road, there are 4 other ways to get what you need without writing a program. As Scott said, you can use PROC MEANS/SUMMARY, but there are other methods as well:
1) PROC MEANS (SUMMARY) method
2) PROC REPORT method
3) PROC TABULATE method
4) PROC SQL method
5) Data step program using FIRST. and LAST. (requires a PROC SORT step)
The methods are not listed in any particular order, except all the procedure methods are listed before the DATA step method.
The PROC MEANS/SUMMARY method is one way to go. It will require the use of some options and extra statements to create an output dataset. PROC REPORT can also create an output dataset and the syntax might be a bit more straightforward than PROC MEANS. PROC TABULATE can create an output dataset, also; and last, but not least in the procedure department, there's PROC SQL that can create an output dataset.
There may be a reason you want to use a DATA step program. This approach teaches some basic concepts that are very useful for the beginning programmer to know: the BY statement, the use of FIRST.byvar and LAST.byvar, the IF statement, the RETAIN statement, the OUTPUT statement, as well as the KEEP= dataset option.
Along the way, you'll also have to understand that SAS date variables are NUMERIC variables. For example 1/1/2008 or 11/15/1950 are date values that you can read, but inside a SAS dataset, those values are stored as: 17532 and -3334 respectively (because in the SAS world, dates are stored as an offset from 1/11960). So 1/1/2008 is 17532 days AFTER 1/1/1960 and 11/15/1950 is 3334 days BEFORE 1/1/1960 (hence the - sign in -3334 value). This means that you will want to use a SAS pre-defined date format to display your dates and your MIN/MAX values -- or else you'll see numbers that won't make much sense to you. The way you apply formats to variables is with a FORMAT statement or in some procedures with the F= option.
The approach you use depends on the type of output you need and whether you want a "report" along the way or just a dataset to pass to subsequent processes. As I said, all 5 methods will give you an output dataset. They are all shown in the program below. The first part is where I make some fake data -- you would not need the step that creates WORK.MYDATA -- that's just a name I made up. In each of the 5 steps, you would have to plug in the name of -YOUR- dataset after the DATA= option or in the SET statement.
The procedure methods all have their own way of naming the min and the max -- although PROC MEANS and PROC SQL give you control over the name you use for the calculated variables. You'd have to use a RENAME= option with PROC REPORT and PROC TABULATE to get different names. Also, some of the procedures create "extra" variables when they make output datasets. For example, PROC MEANS will create _TYPE_ and _FREQ_ varables; PROC REPORT will create a _BREAK_ variable; PROC TABULATE will create _TABLE_, _TYPE_ and _PAGE_ variables. These extra variables are part of how the output datasets are built by these procedures. You can read about them and decide if you want them by looking at the documentation for each procedure/method.
cynthia
[pre]
** make some data to use;
** if your data are in SAS format already, you will not need this step;
data work.mydata;
infile datalines dlm=',' dsd;
input idval sdate : mmddyy10. edate : mmddyy10.;
datalines;
12345, 1/1/2008, 8/31/2008
12345, 9/1/2008, 9/1/2009
23456, 11/15/1950, 12/31/1951
23456, 6/18/1953, 11/30/1953
;
run;
proc print data=work.mydata;
title 'A) What the data looks like when stored internally';
title2 'Without any formats used for the display of dates';
run;
proc print data=work.mydata;
title 'B) What the data looks like WITH formats used for display of dates';
format sdate edate mmddyy10.;
run;
**1) PROC MEANS approach;
proc means data=work.mydata min max nway;
title '1) Create an OUTPUT dataset using PROC MEANS';
class idval;
var sdate edate;
output out=work.means_minmax min(sdate)=min_sdate max(edate)=max_edate;
run;
proc print data=work.means_minmax;
title '1) This is the new dataset from PROC MEANS';
format min_sdate max_edate mmddyy10.;
run;
** 2) PROC REPORT approach -- can get REPORT and dataset;
** with one pass through the data;
proc report data=work.mydata nowd
out=work.rep_minmax;
title '2) Using PROC REPORT';
column idval sdate edate;
define idval / group;
define sdate / min f=mmddyy10.;
define edate / max f=mmddyy10.;
run;
proc print data=tab_minmax;
title '2) Data created in REPORT step';
format sdate_min edate_max mmddyy10.;
run;
** 3) PROC TABULATE approach can get REPORT and dataset;
** with one pass through the data;
proc tabulate data=work.mydata f=mmddyy10.
out=work.tab_minmax;
title '3) Using PROC TABULATE for REPORT and data';
class idval;
var sdate edate;
table idval,
min*sdate max*edate;
run;
proc print data=tab_minmax;
title '3) Data created in TABULATE step';
format sdate_min edate_max mmddyy10.;
run;
** 4) Proc SQL approach;
proc sql;
create table work.sql_minmax as
select idval, min(sdate) as min_sdate, max(edate) as max_edate
from work.mydata
group by idval
order by idval;
quit;
proc print data=work.sql_minmax;
title '4) Data Created With PROC SQL';
format min_sdate max_edate mmddyy10.;
run;
** 5) Data Step approach;
proc sort data=work.mydata;
by idval;
run;
data ds_minmax(keep=idval min_sdate max_edate);
set work.mydata;
by idval;
retain min_sdate max_edate;
** clear min/max holding variables;
if first.idval then do;
min_sdate = 999999;
max_edate = .;
end;
** check every variable for min/max;
if sdate lt min_sdate then min_sdate = sdate;
if edate gt max_edate then max_edate = edate;
** on last observation for idval, output;
if last.idval then output;
format min_sdate max_edate mmddyy10.;
run;
proc print data=ds_minmax;
title '5) Data created in Data Step program';
format min_sdate max_edate mmddyy10.;
run;
[/pre]