Finding the minimum and maximum of dates group by ID

Accepted Solution Solved
Reply
Contributor
Posts: 36
Accepted Solution

Finding the minimum and maximum of dates group by ID

[ Edited ]

I have a large data (sorted by recent date) set like this :

 

ID             NAME             BEGIN              END               TOP

191080    John             25/04/2016    01/05/2016        NO
191080    John           08/06/2016    10/06/2016         NO
191080    John           21/06/2016    24/06/2016         Yes

191080    John           21/07/2016    24/09/2016         Yes
191141    Alex           16/01/2017    17/01/2017          NO

191141    Alex           22/02/2017    29/04/2017          NO

 

If "Yes" and "NO" occurs in TOP for an observation, then the most recent dates is choosen

But if only "NO" occurs in TOP for an observation, then the minimum of BEGIN and maximum of END are choosen

I mean how get this :

 

ID             NAME             BEGIN              END             

191080    John           21/07/2016    24/09/2016    
191141    Alex           16/01/2017    29/04/2017      


Accepted Solutions
Solution
‎05-03-2017 03:53 AM
Trusted Advisor
Posts: 1,367

Re: Finding the minimum and maximum of dates group by ID

You asked for: "If "Yes" and "NO" occurs in TOP for an observation, then the most recent dates is choosen".

I assume if an ID has all and only "YES" you still want the recent dates.

If max(END) may be less then recent END date than retain max_end too, and care to update it.

 

data want;

  set have;

   by ID;   /* assuming data sorted by ID BEGIN */

        retain yes_flag min_begin;

        if first.ID then do; yes_flag=0; min_begin=begin; end;

        if upcase(top) = 'YES' then yes_flag=1;

        if last.ID then do;

           if yes_flag=0 then begin=min_begin;

           output;

       end;

run;

 

 

 

 

View solution in original post


All Replies
Solution
‎05-03-2017 03:53 AM
Trusted Advisor
Posts: 1,367

Re: Finding the minimum and maximum of dates group by ID

You asked for: "If "Yes" and "NO" occurs in TOP for an observation, then the most recent dates is choosen".

I assume if an ID has all and only "YES" you still want the recent dates.

If max(END) may be less then recent END date than retain max_end too, and care to update it.

 

data want;

  set have;

   by ID;   /* assuming data sorted by ID BEGIN */

        retain yes_flag min_begin;

        if first.ID then do; yes_flag=0; min_begin=begin; end;

        if upcase(top) = 'YES' then yes_flag=1;

        if last.ID then do;

           if yes_flag=0 then begin=min_begin;

           output;

       end;

run;

 

 

 

 

Contributor
Posts: 36

Re: Finding the minimum and maximum of dates group by ID

date value are numeric, how put them as SAS DATE format ?

Trusted Advisor
Posts: 1,367

Re: Finding the minimum and maximum of dates group by ID


John4 wrote:

date value are numeric, how put them as SAS DATE format ?


What do you mean by "numeric" ? When you browse your data what do see in the date columns?

If date is July 1st, 2016 do you see 01072016 or other number?

if positive then you can convert it to date formet by:

    date = input(put(date,z8.),ddmmyy8.);

 

Beyond, consider having TOP values per ID, whose dates to output:

1) NO, YES            (mixed ended by YES)

2) NO, YES, NO    (mixed ended by NO)

3) YES, NO           (mixed ended by NO) 

4) NO, NO             (only NO)

5) YES, YES          (only YES)

Valued Guide
Posts: 797

Re: Finding the minimum and maximum of dates group by ID

[ Edited ]

A variant on @Shmuel's suggestion:

 

data want;
  retain minbeg;

  merge have (keep=id top where=(top="YES"))
        have;
  by ID;

  if first.id then minbeg=(ifn(top="YES",begin,.);
  if last.id;
  begin=coalesce(minbeg,begin);
run;

 

If there are zero YES values, then it takes the most recent date range (i.e. last in a BY group), otherwise begin is set to its first value, but END is unmodified.  

 

The MERGE statement allows you to determine, at the beginning of each BY group, whether any YESes will be present, but it does not change the sequence of BEGIN/END.  If no YESes are present then minbeg is set to missing.

 

The subsetting if ("if last.id;") can be your friend, as in this case it avoids the need for an explicit output statement - just a syle preference really.

Frequent Contributor
Posts: 79

Re: Finding the minimum and maximum of dates group by ID

It appears that you need to capture the first (least) BEGIN and note whether TOP = "Yes" occurs.  I assume that if TOP = "Yes" occurs, but is followed by an observation with TOP = "NO", then we are interested in the observation with TOP = "Yes".  Here first., last., and retain are useful:

 

 

data top

        ( drop = __:

                 top

        )

     ;

  set top ;

  by id ;

  retain __begin

         __end .

         __top "0"

         ;

  if first.id

  then

    do ;

       __begin = begin ;

       __top = "0" ;

    end ;

  if top = "Yes"

  then

    do ;

       __top = "1" ;

       __begin = begin ;

       __end = end ;

    end ;

 

  if last.id

  then

    do ;

       if __top = "1"

       then

         do ;

            begin = __begin ;

            end = __end ;

            output ;

         end ;

        else

          do ;

             begin = __begin ;

             output ;

          end ;

    end ;

run ;

 

HTH,

 

Kevin

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 149 views
  • 4 likes
  • 4 in conversation