<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Finding the largest &amp;quot;chunk&amp;quot; of consecutive year for each ID in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713241#M219996</link>
    <description>&lt;P&gt;In a single data step:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data want;
/*
    ccs : current chunk start
    cce : current chunk end
    lcs : largest chunk start
    lce : largest chunk end
    */
do until(last.id);
    set have; by id;
    if year-1 &amp;gt; cce then ccs = year;        
    cce = year;
    if range(cce, lce) &amp;gt;= range (ccs, lcs) then do;
        lce = cce;
        lcs = ccs;
        end;
    end;
do until(last.id);
    set have; by id;
    inChunk = lcs &amp;lt;= year &amp;lt;= lce; 
    midChunk = year = round(mean(lcs, lce));
    output;
    end;
drop ccs cce lcs lce;    
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="PGStats_0-1611268143131.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/53780i0643F147EF14FFDF/image-size/medium?v=v2&amp;amp;px=400" role="button" title="PGStats_0-1611268143131.png" alt="PGStats_0-1611268143131.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 21 Jan 2021 22:29:13 GMT</pubDate>
    <dc:creator>PGStats</dc:creator>
    <dc:date>2021-01-21T22:29:13Z</dc:date>
    <item>
      <title>Finding the largest "chunk" of consecutive year for each ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713203#M219983</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a property-year level dataset that has inconsistent reporting between 2000 and 2018.&amp;nbsp; My goal is to look at each property (ID), and find the largest "chunk" of consecutive years, as well as mark the middle year of that chunk.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So far, I have created a variable that is 1 for consecutive years and 0 for non-consecutive years.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have;
	by ID year;
run;

data have;
	set have;
	if lag_ID=ID then do;
		if lag_year=year-1 then cons=1;
			else cons=0; end;
		else if lag_ID^=ID then do;
			cons=1; end;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;My thought is to create another variable, named "chunk", which is 1 for all of the first chunk of an ID, then 2 for the next chunk, and so on.&amp;nbsp; Once I have that, I can sum "cons" by ID and chunk to find the largest chunk for each ID.&amp;nbsp; Creating the "chunk" variable is where I'm having trouble.&amp;nbsp; I'm open to other ideas of how to find the largest chunk.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Here is a sample of the data:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
	informat ID 1. year 4.;
	input ID year;
cards;
1 2000
1 2001
1 2002
1 2004
1 2005
1 2006
1 2007
1 2008
1 2010
1 2011
2 2000
2 2001
2 2002
2 2003
2 2004
2 2005
2 2006
2 2007
2 2008
2 2010
2 2017
3 2001
3 2002
3 2003
3 2016
3 2017
3 2018
;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 21 Jan 2021 19:51:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713203#M219983</guid>
      <dc:creator>jss539</dc:creator>
      <dc:date>2021-01-21T19:51:07Z</dc:date>
    </item>
    <item>
      <title>Re: Finding the largest "chunk" of consecutive year for each ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713215#M219986</link>
      <description>&lt;P&gt;HI&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/340135"&gt;@jss539&lt;/a&gt;&amp;nbsp; Assuming I understand your requirement, the following should give you the sequence to measure the chunk count. If this is correct, then next step is just to take the &lt;STRONG&gt;max by group&lt;/STRONG&gt;.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data have;
	informat ID 1. year 4.;
	input ID year;
cards;
1 2000
1 2001
1 2002
1 2004
1 2005
1 2006
1 2007
1 2008
1 2010
1 2011
2 2000
2 2001
2 2002
2 2003
2 2004
2 2005
2 2006
2 2007
2 2008
2 2010
2 2017
3 2001
3 2002
3 2003
3 2016
3 2017
3 2018
;
data want;
 set have;
 by id;
 k=dif(year);
 if first.id then grp=0;
 if first.id or k ne 1 then c=1;
 else if k=1 then c+1;
 if c=1 then grp+1;
 drop k;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 21 Jan 2021 20:43:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713215#M219986</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2021-01-21T20:43:06Z</dc:date>
    </item>
    <item>
      <title>Re: Finding the largest "chunk" of consecutive year for each ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713217#M219987</link>
      <description>&lt;P&gt;Something like-&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data have;
	informat ID 1. year 4.;
	input ID year;
cards;
1 2000
1 2001
1 2002
1 2004
1 2005
1 2006
1 2007
1 2008
1 2010
1 2011
2 2000
2 2001
2 2002
2 2003
2 2004
2 2005
2 2006
2 2007
2 2008
2 2010
2 2017
3 2001
3 2002
3 2003
3 2016
3 2017
3 2018
;
data want;
 set have;
 by id;
 k=dif(year);
 if first.id then grp=0;
 if first.id or k ne 1 then c=1;
 else if k=1 then c+1;
 if c=1 then grp+1;
 drop k c;
run;

proc sql;
 create table final_want(drop=n) as
 select *
 from ( select *,count(grp) as n from want group by id, grp)
 group by id
 having max(n)=n;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="branch"&gt;
&lt;DIV align="center"&gt;
&lt;DIV class="branch"&gt;
&lt;DIV align="center"&gt;
&lt;DIV class="branch"&gt;
&lt;DIV&gt;
&lt;DIV align="center"&gt;
&lt;TABLE class="table" summary="Procedure Print: Data Set WORK.FINAL_WANT" frame="box" rules="all" cellspacing="0" cellpadding="5"&gt;&lt;COLGROUP&gt; &lt;COL /&gt; &lt;COL /&gt;&lt;/COLGROUP&gt;
&lt;THEAD&gt;
&lt;TR&gt;
&lt;TH class="r header" scope="col"&gt;ID&lt;/TH&gt;
&lt;TH class="r header" scope="col"&gt;year&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;1&lt;/TD&gt;
&lt;TD class="r data"&gt;2004&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;1&lt;/TD&gt;
&lt;TD class="r data"&gt;2005&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;1&lt;/TD&gt;
&lt;TD class="r data"&gt;2006&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;1&lt;/TD&gt;
&lt;TD class="r data"&gt;2007&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;1&lt;/TD&gt;
&lt;TD class="r data"&gt;2008&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;2000&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;2001&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;2002&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;2003&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;2004&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;2005&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;2006&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;2007&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;2&lt;/TD&gt;
&lt;TD class="r data"&gt;2008&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;3&lt;/TD&gt;
&lt;TD class="r data"&gt;2001&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;3&lt;/TD&gt;
&lt;TD class="r data"&gt;2002&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;3&lt;/TD&gt;
&lt;TD class="r data"&gt;2003&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;3&lt;/TD&gt;
&lt;TD class="r data"&gt;2016&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;3&lt;/TD&gt;
&lt;TD class="r data"&gt;2017&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;3&lt;/TD&gt;
&lt;TD class="r data"&gt;2018&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 21 Jan 2021 20:52:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713217#M219987</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2021-01-21T20:52:09Z</dc:date>
    </item>
    <item>
      <title>Re: Finding the largest "chunk" of consecutive year for each ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713218#M219988</link>
      <description>&lt;P&gt;This will generate a record for each ID indicating the max streak, the year the streak ends and the size of the streak - so you can easily do the math to figure out the year. This uses basic SAS programming, nothing fancy here so it should be clear but if anything isn't feel free to ask.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have;
	by ID year;
run;

data want;
	set have;
	by id;
	
	retain max_streak max_year;

    prev_year = lag(year);
    
    if first.id then call missing(prev_year, max_streak, max_year, middle_year);
    
    if year - 1 = lag(year) then streak+1;
    else streak = 0;
    
    if streak &amp;gt;= max_streak then do;
           max_streak = streak;
           max_year = year;
    end;
    
    

     if last.id then do;
         middle_year = max_year - max_streak/2;
         output;
     end;
     
     keep id max_streak max_year middle_year;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 21 Jan 2021 20:50:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713218#M219988</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-01-21T20:50:52Z</dc:date>
    </item>
    <item>
      <title>Re: Finding the largest "chunk" of consecutive year for each ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713241#M219996</link>
      <description>&lt;P&gt;In a single data step:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data want;
/*
    ccs : current chunk start
    cce : current chunk end
    lcs : largest chunk start
    lce : largest chunk end
    */
do until(last.id);
    set have; by id;
    if year-1 &amp;gt; cce then ccs = year;        
    cce = year;
    if range(cce, lce) &amp;gt;= range (ccs, lcs) then do;
        lce = cce;
        lcs = ccs;
        end;
    end;
do until(last.id);
    set have; by id;
    inChunk = lcs &amp;lt;= year &amp;lt;= lce; 
    midChunk = year = round(mean(lcs, lce));
    output;
    end;
drop ccs cce lcs lce;    
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="PGStats_0-1611268143131.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/53780i0643F147EF14FFDF/image-size/medium?v=v2&amp;amp;px=400" role="button" title="PGStats_0-1611268143131.png" alt="PGStats_0-1611268143131.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jan 2021 22:29:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713241#M219996</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2021-01-21T22:29:13Z</dc:date>
    </item>
    <item>
      <title>Re: Finding the largest "chunk" of consecutive year for each ID</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713265#M220011</link>
      <description>&lt;P&gt;Because you know the time domain of possible years (2000:2018) you can use an array indexed by year, with each element of the array being the number of consecutive years up to and including the element.&amp;nbsp; After reading all the obs for an id, find the maximum size in the array, determine its element as the end year of the maximum size, and calculate the corresponding beginning year.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The reread all the years for the same id, keeping only those between maxsizbeg and maxsizend:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
    informat ID 1. year 4.;
	input ID year;
cards;
1 2000
1 2001
1 2002
1 2004
1 2005
1 2006
1 2007
1 2008
1 2010
1 2011
2 2000
2 2001
2 2002
2 2003
2 2004
2 2005
2 2006
2 2007
2 2008
2 2010
2 2017
3 2001
3 2002
3 2003
3 2016
3 2017
3 2018
;

  
data want;
  array siz {1999:2018} _temporary_; 

  set have;
  by id;
  if first.id then call missing(of siz{*});

  siz{year} = sum(siz{year-1},1);

  if last.id;
  maxsiz=max(of siz{*});
  maxsizbeg = lbound(siz) + whichn(maxsiz,of siz{*}) - maxsiz ;
  maxsizend = maxsizbeg + maxsiz - 1;

  do until (last.id); /* Reread and filter this id*/
    set have;
    by id;
    if maxsizbeg&amp;lt;=year&amp;lt;=maxsizend then output;
  end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The "trick" here is to create an array with a lower bound of one year prior to your earliest data (i.e. 1999) and an upper bound of the last year in your data (2018).&amp;nbsp; (You could have an even smaller lower-bound and higher upper bound with no harm).&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The statement :&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;siz{year} = sum(siz{year-1},1);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;assigns a size value for the current year equal to one greater than the prior year's size value.&amp;nbsp; But if the prior year is never encountered, then its size value is missing.&amp;nbsp; Since the sum function of 1 plus missing is 1, it means the current size is 1 - i.e. the start of a new time span.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;At the end of an id, get the maximum size, find out where it is in the array&amp;nbsp;&lt;EM&gt;&lt;STRONG&gt;[whichn(maxsize,of siz{*})]&lt;/STRONG&gt;&lt;/EM&gt;, then determine the corresponding maxsizbeg year and maxsizend year.&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jan 2021 01:20:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-the-largest-quot-chunk-quot-of-consecutive-year-for-each/m-p/713265#M220011</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2021-01-22T01:20:28Z</dc:date>
    </item>
  </channel>
</rss>

