<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Group / clustering observations by date in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Group-clustering-observations-by-date/m-p/650053#M194916</link>
    <description>&lt;P&gt;This is a common request.&amp;nbsp; You want to increment the cluster number whenever&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;You begin a new id&lt;/LI&gt;
&lt;LI&gt;You encounter a date more than 7 days after the starting date of the previous cluster.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;To do this in a sas DATA step, you have to keep (i.e. "retain") the starting date of the current cluster, to be compared to the incoming date:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want (drop=startdate);
  set have; 
  by id;
  retain startdate;
  if first.id=1 or date-7 &amp;gt; startdate then do;
    cluster+1;
    startdate=date;
  end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 23 May 2020 00:49:17 GMT</pubDate>
    <dc:creator>mkeintz</dc:creator>
    <dc:date>2020-05-23T00:49:17Z</dc:date>
    <item>
      <title>Group / clustering observations by date</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Group-clustering-observations-by-date/m-p/649994#M194901</link>
      <description>&lt;P&gt;I have a large data set with this structure:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV class="branch"&gt;&lt;DIV class="branch"&gt;&lt;DIV&gt;&lt;DIV align="center"&gt;&lt;TABLE cellspacing="0" cellpadding="5"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;ID&lt;/TD&gt;&lt;TD&gt;DATE&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;16602&lt;/TD&gt;&lt;TD&gt;07/20/2015&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;16602&lt;/TD&gt;&lt;TD&gt;07/25/2015&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;16602&lt;/TD&gt;&lt;TD&gt;07/28/2015&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;20302&lt;/TD&gt;&lt;TD&gt;03/16/2016&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;20302&lt;/TD&gt;&lt;TD&gt;03/18/2016&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;20302&lt;/TD&gt;&lt;TD&gt;03/25/2016&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;20302&lt;/TD&gt;&lt;TD&gt;02/18/2015&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="branch"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="branch"&gt;I would like to define a new "clustering" variable, such that observations with the same &lt;STRONG&gt;ID&lt;/STRONG&gt; value and which occur within 0 to 7 days of each other have the same value of the &lt;STRONG&gt;CLUSTER&lt;/STRONG&gt; variable.&amp;nbsp; If the difference is greater than 7 days, then the &lt;STRONG&gt;CLUSTER&lt;/STRONG&gt; variable should increase by 1, a new cluster created, and the day count resets.&amp;nbsp; The result I want for the above would look like this:&lt;/DIV&gt;&lt;DIV class="branch"&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;DIV align="center"&gt;&lt;TABLE cellspacing="0" cellpadding="5"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;ID&lt;/TD&gt;&lt;TD&gt;DATE&lt;/TD&gt;&lt;TD&gt;CLUSTER&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;16602&lt;/TD&gt;&lt;TD&gt;07/20/2015&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;16602&lt;/TD&gt;&lt;TD&gt;07/25/2015&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;16602&lt;/TD&gt;&lt;TD&gt;07/28/2015&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;20302&lt;/TD&gt;&lt;TD&gt;03/16/2016&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;20302&lt;/TD&gt;&lt;TD&gt;03/18/2016&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;20302&lt;/TD&gt;&lt;TD&gt;03/25/2016&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;20302&lt;/TD&gt;&lt;TD&gt;02/18/2017&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="branch"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class="branch"&gt;Any help would be greatly appreciated.&amp;nbsp; Thanks.&lt;/DIV&gt;</description>
      <pubDate>Fri, 22 May 2020 19:28:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Group-clustering-observations-by-date/m-p/649994#M194901</guid>
      <dc:creator>cchubbard1963</dc:creator>
      <dc:date>2020-05-22T19:28:41Z</dc:date>
    </item>
    <item>
      <title>Re: Group / clustering observations by date</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Group-clustering-observations-by-date/m-p/650053#M194916</link>
      <description>&lt;P&gt;This is a common request.&amp;nbsp; You want to increment the cluster number whenever&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;You begin a new id&lt;/LI&gt;
&lt;LI&gt;You encounter a date more than 7 days after the starting date of the previous cluster.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;To do this in a sas DATA step, you have to keep (i.e. "retain") the starting date of the current cluster, to be compared to the incoming date:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want (drop=startdate);
  set have; 
  by id;
  retain startdate;
  if first.id=1 or date-7 &amp;gt; startdate then do;
    cluster+1;
    startdate=date;
  end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 23 May 2020 00:49:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Group-clustering-observations-by-date/m-p/650053#M194916</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2020-05-23T00:49:17Z</dc:date>
    </item>
    <item>
      <title>Re: Group / clustering observations by date</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Group-clustering-observations-by-date/m-p/650067#M194924</link>
      <description>&lt;P&gt;I would do it similar to what &lt;A class="trigger-hovercard" style="color: #007dc3;" href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461" target="_blank" rel="noopener"&gt;mkeintz&lt;/A&gt; suggested with a couple small but important enhancements. You need your data set to be sorted by ID and DATE, and also have &lt;STRONG&gt;by ID DATE;&lt;/STRONG&gt; in the data step:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data HAVE;
   input ID $1-5 @7 DATE mmddyy10.;
   format DATE mmddyy10.;
   lines;
16602	07/20/2015
16602	07/25/2015
16602	07/28/2015
20302	03/16/2016
20302	03/18/2016
20302	03/25/2016
20302	02/18/2015
;

proc sort data=HAVE;
   by ID DATE;
run;

data WANT (drop=FIRSTDATE);
   set HAVE;
   by ID DATE;
   retain FIRSTDATE;
   if first.ID or DATE-FIRSTDATE&amp;gt;7 then
   do;
      FIRSTDATE = DATE;
      CLUSTER+1;
   end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Hope this helps.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 23 May 2020 02:43:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Group-clustering-observations-by-date/m-p/650067#M194924</guid>
      <dc:creator>LeonidBatkhan</dc:creator>
      <dc:date>2020-05-23T02:43:57Z</dc:date>
    </item>
  </channel>
</rss>

