Solved
Contributor
Posts: 38

# Finding the minimum and maximum of dates group by ID

[ Edited ]

I have a large data (sorted by recent date) set like this :

ID             NAME             BEGIN              END               TOP

191080    John             25/04/2016    01/05/2016        NO
191080    John           08/06/2016    10/06/2016         NO
191080    John           21/06/2016    24/06/2016         Yes

191080    John           21/07/2016    24/09/2016         Yes
191141    Alex           16/01/2017    17/01/2017          NO

191141    Alex           22/02/2017    29/04/2017          NO

If "Yes" and "NO" occurs in TOP for an observation, then the most recent dates is choosen

But if only "NO" occurs in TOP for an observation, then the minimum of BEGIN and maximum of END are choosen

I mean how get this :

ID             NAME             BEGIN              END

191080    John           21/07/2016    24/09/2016
191141    Alex           16/01/2017    29/04/2017

Accepted Solutions
Solution
‎05-03-2017 03:53 AM
Posts: 1,837

## Re: Finding the minimum and maximum of dates group by ID

You asked for: "If "Yes" and "NO" occurs in TOP for an observation, then the most recent dates is choosen".

I assume if an ID has all and only "YES" you still want the recent dates.

If max(END) may be less then recent END date than retain max_end too, and care to update it.

data want;

set have;

by ID;   /* assuming data sorted by ID BEGIN */

retain yes_flag min_begin;

if first.ID then do; yes_flag=0; min_begin=begin; end;

if upcase(top) = 'YES' then yes_flag=1;

if last.ID then do;

if yes_flag=0 then begin=min_begin;

output;

end;

run;

All Replies
Solution
‎05-03-2017 03:53 AM
Posts: 1,837

## Re: Finding the minimum and maximum of dates group by ID

You asked for: "If "Yes" and "NO" occurs in TOP for an observation, then the most recent dates is choosen".

I assume if an ID has all and only "YES" you still want the recent dates.

If max(END) may be less then recent END date than retain max_end too, and care to update it.

data want;

set have;

by ID;   /* assuming data sorted by ID BEGIN */

retain yes_flag min_begin;

if first.ID then do; yes_flag=0; min_begin=begin; end;

if upcase(top) = 'YES' then yes_flag=1;

if last.ID then do;

if yes_flag=0 then begin=min_begin;

output;

end;

run;

Contributor
Posts: 38

## Re: Finding the minimum and maximum of dates group by ID

date value are numeric, how put them as SAS DATE format ?

Posts: 1,837

## Re: Finding the minimum and maximum of dates group by ID

John4 wrote:

date value are numeric, how put them as SAS DATE format ?

What do you mean by "numeric" ? When you browse your data what do see in the date columns?

If date is July 1st, 2016 do you see 01072016 or other number?

if positive then you can convert it to date formet by:

date = input(put(date,z8.),ddmmyy8.);

Beyond, consider having TOP values per ID, whose dates to output:

1) NO, YES            (mixed ended by YES)

2) NO, YES, NO    (mixed ended by NO)

3) YES, NO           (mixed ended by NO)

4) NO, NO             (only NO)

5) YES, YES          (only YES)

Posts: 1,337

## Re: Finding the minimum and maximum of dates group by ID

[ Edited ]

A variant on @Shmuel's suggestion:

``````data want;
retain minbeg;

merge have (keep=id top where=(top="YES"))
have;
by ID;

if first.id then minbeg=(ifn(top="YES",begin,.);
if last.id;
begin=coalesce(minbeg,begin);
run;
``````

If there are zero YES values, then it takes the most recent date range (i.e. last in a BY group), otherwise begin is set to its first value, but END is unmodified.

The MERGE statement allows you to determine, at the beginning of each BY group, whether any YESes will be present, but it does not change the sequence of BEGIN/END.  If no YESes are present then minbeg is set to missing.

The subsetting if ("if last.id;") can be your friend, as in this case it avoids the need for an explicit output statement - just a syle preference really.

Frequent Contributor
Posts: 89

## Re: Finding the minimum and maximum of dates group by ID

It appears that you need to capture the first (least) BEGIN and note whether TOP = "Yes" occurs.  I assume that if TOP = "Yes" occurs, but is followed by an observation with TOP = "NO", then we are interested in the observation with TOP = "Yes".  Here first., last., and retain are useful:

data top

( drop = __:

top

)

;

set top ;

by id ;

retain __begin

__end .

__top "0"

;

if first.id

then

do ;

__begin = begin ;

__top = "0" ;

end ;

if top = "Yes"

then

do ;

__top = "1" ;

__begin = begin ;

__end = end ;

end ;

if last.id

then

do ;

if __top = "1"

then

do ;

begin = __begin ;

end = __end ;

output ;

end ;

else

do ;

begin = __begin ;

output ;

end ;

end ;

run ;

HTH,

Kevin

☑ This topic is solved.