DATA Step, Macro, Functions and more

mark consecutive years' value with different result

Reply
Occasional Contributor
Posts: 12

mark consecutive years' value with different result

[ Edited ]

Hello All, Could you do me a great favor to mark differnent results for each group? If count >=1 for at least 4 consecutive years, then result= ‘pass’ for these years, else result= ‘fail’. Thanks very much in advance.

 

 

 

data have;
input  group $ year count;
datalines;
A 2002 8.6
A 2003 6.3
A 2004 2.3
A 2005 4.6
A 2006 2.3
A 2007 4.6
B 2000 0.4
B 2001 1.9
B 2002 1.1
B 2003 1.0
B 2004 1.6
B 2005 1.5
B 2006 0.7
C 2004 4.8
C 2005 4.0
C 2006 4.5
C 2007 3.7
D 2001 4.0
D 2002 6.1
D 2003 3.5
E 2001 1.5
E 2002 0.6
F 2004 0.8
;
run;

 

The output should be:

 

group year count result
A 2002 8.6 pass
A 2003 6.3 pass
A 2004 2.3 pass
A 2005 4.6 pass
A 2006 2.3 pass
A 2007 4.6 pass
B 2000 0.4 fail
B 2001 1.9 pass
B 2002 1.1 pass
B 2003 1.0 pass
B 2004 1.6 pass
B 2005 1.5 pass
B 2006 0.7 fail
C 2004 4.8 pass
C 2005 4.0 pass
C 2006 4.5 pass
C 2007 3.7 pass
D 2001 4.0 fail
D 2002 6.1 fail
D 2003 3.5 fail
E 2001 1.5 fail
E 2002 0.6 fail
F 2004 0.8 fail
Super User
Posts: 5,071

Re: mark consecutive years' value with different result

Can we assume a maximum of 8 observations per GROUP?  It would certainly make the programming easier.

Valued Guide
Posts: 797

Re: mark consecutive years' value with different result

First, thank you for providing data in a sas program form.

 

The technique I would recomend is to

 

  1. Maintain an array, with one element per year, which will get a 1 if the corresponding year is part of 4 consecutinve passes.Loop
    1. initialize to missing
    2. read through all the data for a group, generating an individual 'P' or 'F" for each year
    3. If the current P/F is 'P' and is preceded by 3 P's, then set the corresponding 4 elements of the array to a 1
  2. At the end of the data pass-through above, reread each year
    1. If the year has a "1" in the corresponding array element set result to "PASS".
data want (drop=I pf list4 Y rslt:);

  array rslt{2000:2008};  /* use min and max expected year for array bounds */

  /* Read a group and inspect its sequence of counts*/
  do I=1 by 1 until (last.group);
    set have;
    by group;
    pf= ifc(count>=1,'P','F');
    list4=cats(lag3(PF),lag2(PF),lag(PF),PF);
    if I>=4 and list4='PPPP' then do Y=year-3 to year;
      rslt{Y}=1;
    end;
  end;

  do until (last.group);
    set have;
    by group;
    result=ifc(rslt{Year}=1,'PASS','FAIL');
    output;
  end;
run;
Super User
Posts: 9,662

Re: mark consecutive years' value with different result

data have;
input  group $ year count;
flag=count ge 1;
datalines;
A 2002 8.6
A 2003 6.3
A 2004 2.3
A 2005 4.6
A 2006 2.3
A 2007 4.6
B 2000 0.4
B 2001 1.9
B 2002 1.1
B 2003 1.0
B 2004 1.6
B 2005 1.5
B 2006 0.7
C 2004 4.8
C 2005 4.0
C 2006 4.5
C 2007 3.7
D 2001 4.0
D 2002 6.1
D 2003 3.5
E 2001 1.5
E 2002 0.6
F 2004 0.8
;
run;
data want;
 do i=1 by 1 until(last.flag);
  set have;
  by group flag notsorted;
 end;
 do until(last.flag);
  set have;
  by group flag notsorted;
  if i ge 4 and flag=1 then result='pass ';
   else result='fail';
  output;
 end;
drop i flag;
run;
 
PROC Star
Posts: 162

Re: mark consecutive years' value with different result

And another NOT so smart method, but just for fun;

 

data want;

     if _N_ = 1 then

           do;

                length  _flag_min _flag_max 8;

                declare hash myhash(multidata:'YES' );

                myhash.defineKey('group');

                myhash.defineData('group','_flag_min','_flag_max');

                myhash.defineDone( );

                call missing(_flag_min, _flag_max);

           end;

     do n=1 by 1 until(last.group);

           set have end=last;

           by group;

       if first.group then _flag=0;

       if count>=1 then _flag+1;

           else _flag=0;

       if _flag=1 then    _flag_min=n;

           else if _flag=4 then

                do;

                     _flag_max=n;

                     myhash.add();

                end;

           else if _flag>4 then

                do;

                     _flag_max=n;

                     myhash.add();

                end;

     end;

     do n=1 by 1 until(last.group);

           set have end=last;

           by group;

           result='FAIL';

           rc = myhash.find();

     if (rc = 0) then do;

     if _flag_min<=n<=_flag_max then result='PASS';

     rc = myhash.find_next();

     do while(rc = 0);

     if _flag_min<=n<=_flag_max then result='PASS';

     rc = myhash.find_next();

     end;

   end;

     output;

   end;

drop n _: rc;

run;

 

 

Ask a Question
Discussion stats
  • 4 replies
  • 121 views
  • 0 likes
  • 5 in conversation