<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using SAS to find a streak of occurrences within any seven day window in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/694678#M211865</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/341412"&gt;@wylie_ma&lt;/a&gt;&amp;nbsp;Does this answer your question?&lt;/P&gt;</description>
    <pubDate>Tue, 27 Oct 2020 20:39:19 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2020-10-27T20:39:19Z</dc:date>
    <item>
      <title>Using SAS to find a streak of occurrences within any seven day window</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/693230#M211347</link>
      <description>&lt;P&gt;I have posted a similar query in a different thread and have received some great responses. My requirements are now slightly different than what I initially had.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a sample dataset (below) that links an IP to an Account login attempt. I want to identify any IP (alongside the Account and Date) that has attempted to log into three or more accounts within&amp;nbsp;&lt;EM&gt;any&lt;/EM&gt; 7 day window.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data have;&lt;BR /&gt;infile cards dsd ;&lt;BR /&gt;input IP :$30. Account $ Date :datetime20.;&lt;BR /&gt;format date datetime20.;&lt;BR /&gt;cards;&lt;BR /&gt;108.950.45, 12345, 01MAY2020:11:39:55&lt;BR /&gt;108.950.45, 54321, 09MAY2020:12:55:22&lt;BR /&gt;108.950.45, 67890, 10MAY2020:03:00:02&lt;BR /&gt;108.950.45, 09876, 11MAY2020:06:14:03&lt;BR /&gt;108.950.45, 45879, 11MAY2020:12:37:35&lt;BR /&gt;108.950.45, 78562, 13MAY2020:13:13:09&lt;BR /&gt;108.950.45, 54321, 17MAY2020:07:12:03&lt;BR /&gt;108.950.45, 54686, 30MAY2020:10:00:02&lt;BR /&gt;220.540.40, 45879, 05MAY2020:09:09:15&lt;BR /&gt;220.540.40, 94532, 11MAY2020:03:41:16&lt;BR /&gt;220.540.40, 45879, 11MAY2020:14:00:58&lt;BR /&gt;110.640.35, 54873, 05MAY2020:17:22:01&lt;BR /&gt;110.640.35, 56454, 05MAY2020:19:22:05&lt;BR /&gt;110.640.35, 87654, 20MAY2020:05:14:12&lt;BR /&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My output would hence ideally read:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;108.950.45, 54321, 09MAY2020:12:55:22&lt;BR /&gt;108.950.45, 54321, 10MAY2020:03:00:02&lt;BR /&gt;108.950.45, 09876, 11MAY2020:06:14:03&lt;BR /&gt;108.950.45, 45879, 11MAY2020:12:37:35&lt;BR /&gt;108.950.45, 78562, 13MAY2020:13:13:09&lt;BR /&gt;108.950.45, 67890, 17MAY2020:07:12:03&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In the above output the following criteria is satisfied: (i) three or more unique Account IDs to the IP; (ii) login attempts occurred within a 7 day window. Even though rows with dates 09MAY2020 and 17MAY2020 are more than seven days a part, they are still associated to &lt;EM&gt;a&amp;nbsp;&lt;/EM&gt;7 day streak.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Wed, 21 Oct 2020 16:00:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/693230#M211347</guid>
      <dc:creator>wylie_ma</dc:creator>
      <dc:date>2020-10-21T16:00:23Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS to find a streak of occurrences within any seven day window</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/693388#M211403</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/341412"&gt;@wylie_ma&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've coded up a little something; see code below.&amp;nbsp; Here is what it ultimately produces in the last Data step:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jimbarbour_0-1603343150247.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/50896iEEC33909FD770C62/image-size/medium?v=v2&amp;amp;px=400" role="button" title="jimbarbour_0-1603343150247.png" alt="jimbarbour_0-1603343150247.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The code is below.&amp;nbsp; It's getting late, and I'm tired, so I won't try to walk you through the code right now.&amp;nbsp; Please take a look at the code, and if you have any questions or encounter any problems, just post a response here in this thread, and I will reply tomorrow.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Jim&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA	Have;
	INFILE	Cards		DSD	DLM=',';
	FORMAT	IP			$30.;
	FORMAT	_DateTm		Datetime20.;

	INPUT	IP		:	$30. 
			_Account	$
			_DateTm	:	Datetime20.
			;

Cards;
108.950.45, 12345, 01MAY2020:11:39:55
108.950.45, 54321, 09MAY2020:12:55:22
108.950.45, 67890, 10MAY2020:03:00:02
108.950.45, 09876, 11MAY2020:06:14:03
108.950.45, 45879, 11MAY2020:12:37:35
108.950.45, 78562, 13MAY2020:13:13:09
108.950.45, 54321, 17MAY2020:07:12:03
108.950.45, 54686, 30MAY2020:10:00:02
220.540.40, 45879, 05MAY2020:09:09:15
220.540.40, 94532, 11MAY2020:03:41:16
220.540.40, 45879, 11MAY2020:14:00:58
110.640.35, 54873, 05MAY2020:17:22:01
110.640.35, 56454, 05MAY2020:19:22:05
110.640.35, 87654, 20MAY2020:05:14:12
;
RUN;

**------------------------------------------------------------------------------**;

PROC	SORT	DATA=	Have
				OUT	=	Have_Sort
				;
	BY	IP	_DateTm;
RUN;

**------------------------------------------------------------------------------**;

DATA	Identified_IPs;
	DROP	_:;

	LENGTH	IP				$30;

	RETAIN	_Work_Count;
	RETAIN	Access_Count;
	RETAIN	_Save_Date;
	RETAIN	_Prior_Date;
	RETAIN	First_Date;

	FORMAT	_Save_Date		YYMMDDD10.;
	FORMAT	_Prior_Date		YYMMDDD10.;
	FORMAT	_Current_Date	YYMMDDD10.;
	FORMAT	First_Date		YYMMDDD10.;

	SET	Have_Sort;
		BY	IP;

	_Current_Date						=	DATEPART(_DateTm);

	IF	_N_								=	1						THEN
		DO;
			DECLARE		Hash		Hsh_Accounts();
				Hsh_Accounts.DEFINEKEY('_Account');
				Hsh_Accounts.DEFINEDONE();
		END;

	IF	FIRST.IP													THEN
		DO;
			_Work_Count					=	1;
			Access_Count				=	0;
			_RC							=	Hsh_Accounts.CLEAR();
			_RC							=	Hsh_Accounts.ADD();
			_3_Or_More					=	0;
			_Prior_Date					=	_Current_Date;
			_Save_Date					=	_Current_Date;
		END;
	ELSE
		DO;
			_Days_Apart					=	(_Current_Date	-	_Prior_Date);
			IF	_Days_Apart				&amp;lt;=	7						THEN
				DO;
					_RC					=	Hsh_Accounts.ADD();
					IF	_RC				=	0						THEN
						_Work_Count		+	1;
				END;
			ELSE
				DO;
					_Work_Count			=	1;
					_Save_Date			=	_Current_Date;
				END;
			_Prior_Date					=	_Current_Date;
		END;

	IF	_Work_Count						&amp;gt;=	3						THEN
		DO;
			_3_Or_More					=	1;
			First_Date					=	_Save_Date;
			IF	_Work_Count				&amp;gt;	Access_Count			THEN
				Access_Count			=	_Work_Count;
		END;

	IF	LAST.IP														AND
		Access_Count					&amp;gt;=	3						THEN
		DO;
			OUTPUT;
		END;
	ELSE
		DO;
			DELETE;
		END;
RUN;

**------------------------------------------------------------------------------**;

DATA	Identified_Records;
	DROP	_:;
	DROP	First_Date;
	DROP	Access_Count;

	LENGTH	IP				$30;

	RETAIN	_Prior_Date;

	FORMAT	First_Date		YYMMDDD10.;
	FORMAT	_Prior_Date		YYMMDDD10.;
	FORMAT	_Current_Date	YYMMDDD10.;

	IF	_N_								=	1						THEN
		DO;
			IF	0	THEN
				SET	WORK.Identified_IPs;
			DECLARE		Hash		Hsh_IP_Look_Up(dataset: 'WORK.Identified_IPs');
				Hsh_IP_Look_Up.DEFINEKEY('IP');
				Hsh_IP_Look_Up.DEFINEDATA('First_Date');
				Hsh_IP_Look_Up.DEFINEDONE();
		END;

	SET	Have_Sort(RENAME=(_Account=Account _DateTm=DateTm));
		BY	IP;

	_Current_Date						=	DATEPART(DateTm);

	IF	FIRST.IP													THEN
		DO;
			_RC							=	Hsh_IP_Look_Up.FIND();
			IF	_RC						=	0						THEN
				IF	_Current_Date		=	First_Date				THEN
					DO;
						OUTPUT;
					END;
				ELSE
					DO;
					END;
			ELSE
				DO;
				END;
			_Prior_Date					=	_Current_Date;
		END;
	ELSE
		DO;
			_RC							=	Hsh_IP_Look_Up.FIND();
			IF	_RC						=	0						THEN
				IF	_Current_Date		=	First_Date				THEN
					DO;
						OUTPUT;
					END;
				ELSE
					DO;
						_Days_Apart		=	(_Current_Date	-	_Prior_Date);
						IF	_Days_Apart	&amp;lt;=	7						THEN
							DO;
								_RC		=	Hsh_IP_Look_Up.FIND();
								IF	_RC	=	0						THEN
									DO;
										OUTPUT;
									END;
								ELSE
									DO;
									END;
							END;
						ELSE
							DO;
							END;
					END;
			ELSE
				DO;
				END;
			_Prior_Date					=	_Current_Date;
		END;

	DELETE;
RUN;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 22 Oct 2020 05:09:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/693388#M211403</guid>
      <dc:creator>jimbarbour</dc:creator>
      <dc:date>2020-10-22T05:09:02Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS to find a streak of occurrences within any seven day window</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/693416#M211420</link>
      <description>&lt;P&gt;Something like this?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
infile cards dsd ;
input IP :$30. Account $ Date :datetime20.;
format date datetime20.;
cards;
108.950.45, 12345, 01MAY2020:11:39:55
108.950.45, 54321, 09MAY2020:12:55:22
108.950.45, 67890, 10MAY2020:03:00:02
108.950.45, 09876, 11MAY2020:06:14:03
108.950.45, 45879, 11MAY2020:12:37:35
108.950.45, 78562, 13MAY2020:13:13:09
108.950.45, 54321, 27MAY2020:07:12:03
108.950.45, 54686, 30MAY2020:10:00:02
220.540.40, 45879, 05MAY2020:09:09:15
220.540.40, 94532, 11MAY2020:03:41:16
220.540.40, 45879, 11MAY2020:14:00:58
110.640.35, 54873, 05MAY2020:17:22:01
110.640.35, 56454, 05MAY2020:19:22:05
110.640.35, 87654, 20MAY2020:05:14:12
run;

proc sql;
  create table SEVEN_DAY_FILTER as
  select unique a.*   
  from HAVE a, HAVE b  
  where a.IP=b.IP
    and a.DATE ne b.DATE
    and -8 &amp;lt; intck('dtday',a.DATE,b.DATE) &amp;lt; 8
  order by a.IP, a.DATE;
quit;

data DATE_GROUPS;
  set SEVEN_DAY_FILTER;
  by IP;
  if intck('dtday',lag(DATE),DATE) &amp;gt; 7 then GROUP+1;
  else if first.IP then GROUP+1;
run;

proc sql;
  create table THREE_ACCT_FILTER as
  select *
  from DATE_GROUPS
  group by GROUP
  having count(distinct ACCOUNT) &amp;gt; 2
  order by IP, DATE;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="branch"&gt;&lt;BR /&gt;
&lt;DIV&gt;
&lt;DIV align="left"&gt;
&lt;TABLE class="table" summary="Procedure SQL: Query Results" width="313px" frame="box" rules="all" cellspacing="0" cellpadding="5"&gt;&lt;COLGROUP&gt; &lt;COL /&gt; &lt;COL /&gt; &lt;COL /&gt; &lt;COL /&gt;&lt;/COLGROUP&gt;
&lt;THEAD&gt;
&lt;TR&gt;
&lt;TH class="l b header" scope="col" width="92px"&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;IP&lt;/FONT&gt;&lt;/TH&gt;
&lt;TH class="l b header" scope="col" width="56px"&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;Account&lt;/FONT&gt;&lt;/TH&gt;
&lt;TH class="r b header" scope="col" width="165px"&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;Date&lt;/FONT&gt;&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="92px" class="l data"&gt;108.950.45&lt;/TD&gt;
&lt;TD width="56px" class="l data"&gt;54321&lt;/TD&gt;
&lt;TD width="165px" class="r data"&gt;09MAY2020:12:55:22&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="92px" class="l data"&gt;108.950.45&lt;/TD&gt;
&lt;TD width="56px" class="l data"&gt;67890&lt;/TD&gt;
&lt;TD width="165px" class="r data"&gt;10MAY2020:03:00:02&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="92px" class="l data"&gt;108.950.45&lt;/TD&gt;
&lt;TD width="56px" class="l data"&gt;09876&lt;/TD&gt;
&lt;TD width="165px" class="r data"&gt;11MAY2020:06:14:03&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="92px" class="l data"&gt;108.950.45&lt;/TD&gt;
&lt;TD width="56px" class="l data"&gt;45879&lt;/TD&gt;
&lt;TD width="165px" class="r data"&gt;11MAY2020:12:37:35&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="92px" class="l data"&gt;108.950.45&lt;/TD&gt;
&lt;TD width="56px" class="l data"&gt;78562&lt;/TD&gt;
&lt;TD width="165px" class="r data"&gt;13MAY2020:13:13:09&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P&gt;Note: I changed one date to validate the steps.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 08:06:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/693416#M211420</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-10-22T08:06:39Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS to find a streak of occurrences within any seven day window</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/693429#M211429</link>
      <description>&lt;P&gt;I was thinking the same approach, but you beat me to it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We often spend so much effort preventing an unwanted Cartesian product in a multi-table join, that we forget that a Cartesian product, usually with a table self-join, can be our friend.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Having every row within a key combined with every other row within that key, then filtering as required, can be very powerful.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Edit:&lt;/STRONG&gt;&amp;nbsp; as long as the table isn't, say, 100M rows &lt;span class="lia-unicode-emoji" title=":winking_face:"&gt;😉&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 08:51:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/693429#M211429</guid>
      <dc:creator>ScottBass</dc:creator>
      <dc:date>2020-10-22T08:51:09Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS to find a streak of occurrences within any seven day window</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/693646#M211545</link>
      <description>&lt;P&gt;Another (probably better) way:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data GROUPS;
 do GROUP='01jan2020'd to '24dec2020'd;
   START=dhms(GROUP,0,0,0);
   END  =dhms(GROUP+7,0,0,0)-1;
   output;
 end;
 format START END datetime.;
run;

proc sql;
  create table WANT as
  select unique HAVE.*
  from HAVE
     , GROUPS
  where DATE between START and END 
  group by IP, GROUP
  having count(distinct ACCOUNT) &amp;gt; 2
  order by IP, DATE;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Oct 2020 22:10:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/693646#M211545</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-10-26T22:10:33Z</dc:date>
    </item>
    <item>
      <title>Re: Using SAS to find a streak of occurrences within any seven day window</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/694678#M211865</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/341412"&gt;@wylie_ma&lt;/a&gt;&amp;nbsp;Does this answer your question?&lt;/P&gt;</description>
      <pubDate>Tue, 27 Oct 2020 20:39:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-SAS-to-find-a-streak-of-occurrences-within-any-seven-day/m-p/694678#M211865</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-10-27T20:39:19Z</dc:date>
    </item>
  </channel>
</rss>

