HI all,
I am trying to get to know the min and max value of date in a variable in data step . I ll share a sample data set and appreciate if some one can help me out.
I have applied min max function but it is giving same value to every observation. I am wishing to create two new variable with min(startdate) and max(end date) corresponding to their SUBJID . Thanks!
Sample -
SUBJID Visitdate
101-02 15AUG17
101-02 14AUG18
101-02 24Sep17
101-02
101-04 10JAN18
101-04 14DEC18
101-04 07DEC17
101-04
101-07 13MAR17
101-07 28JUL17
101-07 22APR18
101-07 18JAN19
101-07
you can do something like. min and max function in datastep work on rows not columns, probably you are having issue there
Proc sql;
select subjid,
min(date) as min_date format=date9.,
max(date) as max_date format=date9.
from have
group by subjid;
you can do something like. min and max function in datastep work on rows not columns, probably you are having issue there
Proc sql;
select subjid,
min(date) as min_date format=date9.,
max(date) as max_date format=date9.
from have
group by subjid;
I think its too bad that you have to do this in a data step. It's very simple in PROC SQL, it's very simple in PROC MEANS, and so I'd urge you to do this calculation in one of those SAS procedures. The other benefit, of course, is that these procedures have been tested by SAS and tested in 14 bazillion real world applications, so you can have confidence in the results. If you write your own code ... well, you run the risk of mistakes creeping into your calculations, and you (not SAS) have to verify you have done it properly.
@PaigeMiller wrote:
I think its too bad that you have to do this in a data step.
But it's a good learning exercise to understand how a data step works, particularly BY and FIRST/LAST, though contrived of course.
@novinosrin uses DoW loops. If this is for homework this is a dead give away that its not your own work.
You're most likely needing to use RETAIN (implicit or explicit) and FIRST/LAST processing.
proc sort data=sashelp.class out=class;
by sex;
data want;
set class;
*by group specified;
by sex;
*keep across rows;
retain max_age;
*if first of each Sex group;
if first.sex then max_age = age;
*other records;
else max_age = max(age, max_age);
*if last of group;
if last.sex then output;
*keep only relevant variables;
keep sex age;
run;
data have;
input SUBJID $ Visitdate :date7.;
format visitdate date7.;
cards;
101-02 15AUG17
101-02 14AUG18
101-02 24Sep17
101-02 .
101-04 10JAN18
101-04 14DEC18
101-04 07DEC17
101-04 .
101-07 13MAR17
101-07 28JUL17
101-07 22APR18
101-07 18JAN19
101-07 .
;
data want;
do until(last.SUBJID);
set have;
by SUBJID;
mindate=min(Visitdate,mindate);
maxdate=max(Visitdate,maxdate);
end;
drop visitdate;
format mindate maxdate date7.;
run;
In my humble opinion, Mr Quant genuis @PaigeMiller 's recommendation is the best i.e proc means/summary. Just loop a HAVE dataset
to 50e6 and output. (50 million records)
See the performance difference between sql, datastep and proc means class/by statement. You will decide whether Starbucks or somewhere else
I totally agree with you and Mr.Paige, but my focus here to see the use of BY group processing and I ll understand and learn this Proc sql solution in depth to use further.
This is the SUMMARY solution:
proc summary data=have;
class subjid;
/* if that explodes your memory, sort first and use BY */
/* if dataset is already sorted, use BY anyway */
/* if you use CLASS with more than one variable, add NWAY to the PROC SUMMARY statement */
var date;
output
out=want (drop=_type_ _freq_)
min()=min_date
max()=max_date
;
run;
Thanks a lot to everyone, I got to know multiple ways of doing one problem and that is beauty of SAS Community.
This seems the most straightforward answer so far:
proc summary data=sample;
var date;
output out=date_range (drop=_type_ _freq_) min=min_date max=max_date;
run;
Thanks
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.