BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shanky_44
Obsidian | Level 7

HI all,

 

I am trying to get to know the min and max value of date in a variable in data step . I ll share a sample data set and appreciate if some one can help me out. 

I have applied min max function but it is giving same value to every observation.  I am wishing to create two new variable with min(startdate) and max(end date) corresponding to their SUBJID . Thanks!

 

Sample - 

 

SUBJID     Visitdate

101-02       15AUG17

101-02       14AUG18

101-02       24Sep17

101-02       

101-04       10JAN18

101-04       14DEC18 

101-04       07DEC17

101-04

101-07       13MAR17

101-07       28JUL17

101-07       22APR18

101-07       18JAN19

101-07

1 ACCEPTED SOLUTION

Accepted Solutions
kiranv_
Rhodochrosite | Level 12

you can do something like. min and max function in datastep work on rows not columns, probably you are having issue there

 

 

Proc sql;
select subjid,
 min(date) as min_date format=date9.,
 max(date) as max_date format=date9.
from have
group by subjid;



 

View solution in original post

13 REPLIES 13
kiranv_
Rhodochrosite | Level 12

you can do something like. min and max function in datastep work on rows not columns, probably you are having issue there

 

 

Proc sql;
select subjid,
 min(date) as min_date format=date9.,
 max(date) as max_date format=date9.
from have
group by subjid;



 

shanky_44
Obsidian | Level 7
Hi Kiranv

Thanks for your reply, I have to do this in data step, sry that was nt mentioned in problem statment. I think first. last . statment can help in this case. Isn't it ?
PaigeMiller
Diamond | Level 26

I think its too bad that you have to do this in a data step. It's very simple in PROC SQL, it's very simple in PROC MEANS, and so I'd urge you to do this calculation in one of those SAS procedures. The other benefit, of course, is that these procedures have been tested by SAS and tested in 14 bazillion real world applications, so you can have confidence in the results. If you write your own code ... well, you run the risk of mistakes creeping into your calculations, and you (not SAS) have to verify you have done it properly.

--
Paige Miller
Reeza
Super User

@PaigeMiller wrote:

I think its too bad that you have to do this in a data step. 

 

But it's a good learning exercise to understand how a data step works, particularly BY and FIRST/LAST, though contrived of course.


 

shanky_44
Obsidian | Level 7
I got your point Paige, really appreciate your input. I also need to understand how Data step can me help me out in similar scenarios. Thanks for you help!
Reeza
Super User

@novinosrin uses DoW loops. If this is for homework this is a dead give away that its not your own work. 

 

You're most likely needing to use RETAIN (implicit or explicit) and FIRST/LAST processing. 

 

 

 

proc sort data=sashelp.class out=class;
by sex;

data want;
set class;

*by group specified;
by sex;

*keep across rows;
retain max_age;

*if first of each Sex group;
if first.sex then max_age = age;
*other records;
else max_age = max(age, max_age);

*if last of group;
if last.sex then output;

*keep only relevant variables;
keep sex age;
run;
novinosrin
Tourmaline | Level 20

data have;
input SUBJID $    Visitdate :date7.;
format visitdate date7.;
cards;
101-02       15AUG17
101-02       14AUG18
101-02       24Sep17
101-02       .
101-04       10JAN18
101-04       14DEC18 
101-04       07DEC17
101-04			.
101-07       13MAR17
101-07       28JUL17
101-07       22APR18
101-07       18JAN19
101-07			.
;


data want;
do  until(last.SUBJID);
set have;
by SUBJID;
mindate=min(Visitdate,mindate);
maxdate=max(Visitdate,maxdate);
end;
drop visitdate;
format mindate maxdate date7.;
run;
shanky_44
Obsidian | Level 7
Hi novinosrin,

Thanks, This worked perfectly.
novinosrin
Tourmaline | Level 20

In my humble opinion, Mr Quant genuis @PaigeMiller  's recommendation is the best i.e proc means/summary. Just loop a HAVE dataset

to 50e6 and output. (50 million records)

 

See the performance difference between sql, datastep and proc means class/by statement.  You will decide whether Starbucks or somewhere else

shanky_44
Obsidian | Level 7

I totally agree with you and Mr.Paige, but my focus here to see the  use of BY group processing and I ll understand and learn this Proc sql solution in depth to use further. 

 

 

Kurt_Bremser
Super User

This is the SUMMARY solution:

proc summary data=have;
class subjid;
/* if that explodes your memory, sort first and use BY */
/* if dataset is already sorted, use BY anyway */
/* if you use CLASS with more than one variable, add NWAY to the PROC SUMMARY statement */
var date;
output
  out=want (drop=_type_ _freq_)
  min()=min_date
  max()=max_date
;
run;
shanky_44
Obsidian | Level 7

Thanks a lot to everyone, I got to know multiple ways of doing one problem and that is beauty of SAS Community. 

BiFrost
Calcite | Level 5

This seems the most straightforward answer so far:

 

proc summary data=sample;
var date;
output out=date_range (drop=_type_ _freq_) min=min_date max=max_date;
run;

 

Thanks

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 64317 views
  • 11 likes
  • 7 in conversation