I would like to count the occurrences of an IP address over a rolling three day time frame. Below is sample data.
ID_Number | IP_Address | Logon_Time |
1 | 99.98.127.208 | 06DEC2017:10:55:01.855000 |
2 | 99.98.145.45 | 12DEC2013:00:32:57.354000 |
3 | 99.98.145.45 | 03JAN2014:13:53:06.453000 |
4 | 99.98.145.45 | 03JAN2014:14:10:51.439000 |
5 | 99.98.145.45 | 19JAN2014:04:44:20.043000 |
6 | 99.98.148.185 | 15NOV2017:23:42:09.202000 |
7 | 99.98.148.185 | 04DEC2017:22:52:47.562000 |
8 | 99.98.148.185 | 11JAN2018:15:12:00.020000 |
9 | 99.98.148.185 | 11JAN2018:16:13:25.536000 |
10 | 99.98.148.185 | 12JAN2018:16:13:25.536000 |
11 | 99.98.148.185 | 13JAN2018:16:13:25.536000 |
12 | 99.98.148.185 | 14JAN2018:16:13:25.536000 |
13 | 99.98.148.185 | 13MAR2018:20:50:20.437000 |
14 | 99.98.171.56 | 14JUN2016:22:30:45.895000 |
15 | 99.98.171.56 | 07JUL2016:20:17:28.182000 |
16 | 99.98.171.56 | 07JUL2016:21:02:56.512000 |
17 | 99.98.237.214 | 16JUL2012:18:27:58.504000 |
18 | 99.98.237.214 | 17JUL2012:19:50:17.975000 |
The output should look like the following:
ID_Number | IP_Address | Logon_Time | Count |
1 | 99.98.127.208 | 06DEC2017:10:55:01.855000 | 1 |
2 | 99.98.145.45 | 12DEC2013:00:32:57.354000 | 1 |
3 | 99.98.145.45 | 03JAN2014:13:53:06.453000 | 2 |
4 | 99.98.145.45 | 03JAN2014:14:10:51.439000 | 2 |
5 | 99.98.145.45 | 19JAN2014:04:44:20.043000 | 1 |
6 | 99.98.148.185 | 15NOV2017:23:42:09.202000 | 1 |
7 | 99.98.148.185 | 04DEC2017:22:52:47.562000 | 1 |
8 | 99.98.148.185 | 11JAN2018:15:12:00.020000 | 4 |
9 | 99.98.148.185 | 11JAN2018:16:13:25.536000 | 4 |
10 | 99.98.148.185 | 12JAN2018:16:13:25.536000 | 5 |
11 | 99.98.148.185 | 13JAN2018:16:13:25.536000 | 5 |
12 | 99.98.148.185 | 14JAN2018:16:13:25.536000 | 3 |
13 | 99.98.148.185 | 13MAR2018:20:50:20.437000 | 1 |
14 | 99.98.171.56 | 14JUN2016:22:30:45.895000 | 1 |
15 | 99.98.171.56 | 07JUL2016:20:17:28.182000 | 2 |
16 | 99.98.171.56 | 07JUL2016:21:02:56.512000 | 2 |
17 | 99.98.237.214 | 16JUL2012:18:27:58.504000 | 2 |
18 | 99.98.237.214 | 17JUL2012:19:50:17.975000 | 2 |
The code should be able to do a count of both 3 days ahead and 3 days behind. See ID_Numbers 8-12 above.
data have;
infile datalines truncover;
input (ID_Number IP_Address)(:$50.) Logon_Time : datetime30.;
format Logon_Time datetime30.;
datalines;
1 99.98.127.208 06DEC2017:10:55:01.855000
2 99.98.145.45 12DEC2013:00:32:57.354000
3 99.98.145.45 03JAN2014:13:53:06.453000
4 99.98.145.45 03JAN2014:14:10:51.439000
5 99.98.145.45 19JAN2014:04:44:20.043000
6 99.98.148.185 15NOV2017:23:42:09.202000
7 99.98.148.185 04DEC2017:22:52:47.562000
8 99.98.148.185 11JAN2018:15:12:00.020000
9 99.98.148.185 11JAN2018:16:13:25.536000
10 99.98.148.185 12JAN2018:16:13:25.536000
11 99.98.148.185 13JAN2018:16:13:25.536000
12 99.98.148.185 14JAN2018:16:13:25.536000
13 99.98.148.185 13MAR2018:20:50:20.437000
14 99.98.171.56 14JUN2016:22:30:45.895000
15 99.98.171.56 07JUL2016:20:17:28.182000
16 99.98.171.56 07JUL2016:21:02:56.512000
17 99.98.237.214 16JUL2012:18:27:58.504000
18 99.98.237.214 17JUL2012:19:50:17.975000
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H (dataset:'have',multidata:'y') ;
h.definekey ("IP_Address") ;
h.definedata ("Logon_Time") ;
h.definedone () ;
end;
set have(rename=(Logon_Time=_Logon_Time));
count=0;
rc = h.find();
do while(rc = 0);
if intnx('days',datepart(_Logon_Time),-3+1)<=datepart(Logon_Time)<=intnx('days',datepart(_Logon_Time),+3-1) then count+1;
rc = h.find_next();
end;
drop rc Logon_Time;
run;
Is your time frame three days ahead then? Three days back or a plus-minus 3 day window on either side?
For a record today, would the interval be from April 17 to April 23 inclusive or would it be April 18 to April 22 inclusive?
Do you have a SAS /ETS license?
@Sth19 wrote:
I would like to count the occurrences of an IP address over a rolling three day time frame. Below is sample data.
ID_Number IP_Address Logon_Time 1 99.98.127.208 06DEC2017:10:55:01.855000 2 99.98.145.45 12DEC2013:00:32:57.354000 3 99.98.145.45 03JAN2014:13:53:06.453000 4 99.98.145.45 03JAN2014:14:10:51.439000 5 99.98.145.45 19JAN2014:04:44:20.043000 6 99.98.148.185 15NOV2017:23:42:09.202000 7 99.98.148.185 04DEC2017:22:52:47.562000 8 99.98.148.185 11JAN2018:15:12:00.020000 9 99.98.148.185 11JAN2018:16:13:25.536000 10 99.98.148.185 12JAN2018:16:13:25.536000 11 99.98.148.185 13JAN2018:16:13:25.536000 12 99.98.148.185 14JAN2018:16:13:25.536000 13 99.98.148.185 13MAR2018:20:50:20.437000 14 99.98.171.56 14JUN2016:22:30:45.895000 15 99.98.171.56 07JUL2016:20:17:28.182000 16 99.98.171.56 07JUL2016:21:02:56.512000 17 99.98.237.214 16JUL2012:18:27:58.504000 18 99.98.237.214 17JUL2012:19:50:17.975000
The output should look like the following:
ID_Number IP_Address Logon_Time Count 1 99.98.127.208 06DEC2017:10:55:01.855000 1 2 99.98.145.45 12DEC2013:00:32:57.354000 1 3 99.98.145.45 03JAN2014:13:53:06.453000 2 4 99.98.145.45 03JAN2014:14:10:51.439000 2 5 99.98.145.45 19JAN2014:04:44:20.043000 1 6 99.98.148.185 15NOV2017:23:42:09.202000 1 7 99.98.148.185 04DEC2017:22:52:47.562000 1 8 99.98.148.185 11JAN2018:15:12:00.020000 4 9 99.98.148.185 11JAN2018:16:13:25.536000 4 10 99.98.148.185 12JAN2018:16:13:25.536000 5 11 99.98.148.185 13JAN2018:16:13:25.536000 5 12 99.98.148.185 14JAN2018:16:13:25.536000 3 13 99.98.148.185 13MAR2018:20:50:20.437000 1 14 99.98.171.56 14JUN2016:22:30:45.895000 1 15 99.98.171.56 07JUL2016:20:17:28.182000 2 16 99.98.171.56 07JUL2016:21:02:56.512000 2 17 99.98.237.214 16JUL2012:18:27:58.504000 2 18 99.98.237.214 17JUL2012:19:50:17.975000 2
The code should be able to do a count of both 3 days ahead and 3 days behind. See ID_Numbers 8-12 above.
data have;
infile datalines truncover;
input (ID_Number IP_Address)(:$50.) Logon_Time : datetime30.;
format Logon_Time datetime30.;
datalines;
1 99.98.127.208 06DEC2017:10:55:01.855000
2 99.98.145.45 12DEC2013:00:32:57.354000
3 99.98.145.45 03JAN2014:13:53:06.453000
4 99.98.145.45 03JAN2014:14:10:51.439000
5 99.98.145.45 19JAN2014:04:44:20.043000
6 99.98.148.185 15NOV2017:23:42:09.202000
7 99.98.148.185 04DEC2017:22:52:47.562000
8 99.98.148.185 11JAN2018:15:12:00.020000
9 99.98.148.185 11JAN2018:16:13:25.536000
10 99.98.148.185 12JAN2018:16:13:25.536000
11 99.98.148.185 13JAN2018:16:13:25.536000
12 99.98.148.185 14JAN2018:16:13:25.536000
13 99.98.148.185 13MAR2018:20:50:20.437000
14 99.98.171.56 14JUN2016:22:30:45.895000
15 99.98.171.56 07JUL2016:20:17:28.182000
16 99.98.171.56 07JUL2016:21:02:56.512000
17 99.98.237.214 16JUL2012:18:27:58.504000
18 99.98.237.214 17JUL2012:19:50:17.975000
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H (dataset:'have',multidata:'y') ;
h.definekey ("IP_Address") ;
h.definedata ("Logon_Time") ;
h.definedone () ;
end;
set have(rename=(Logon_Time=_Logon_Time));
count=0;
rc = h.find();
do while(rc = 0);
if intnx('days',datepart(_Logon_Time),-3+1)<=datepart(Logon_Time)<=intnx('days',datepart(_Logon_Time),+3-1) then count+1;
rc = h.find_next();
end;
drop rc Logon_Time;
run;
For record 12, you show a count of 3 in the desired results.
Should that be 4? I think records 9, 10, 11, and 12 all fall within the 3 day time frame, but I may be wrong. Record 8 is too far back and Record 13 is too far forward. It may be an < vs <= thing.
data have;
infile datalines truncover;
input (ID_Number IP_Address)(:$50.) Logon_Time : datetime30.;
format Logon_Time datetime30.;
datalines;
1 99.98.127.208 06DEC2017:10:55:01.855000
2 99.98.145.45 12DEC2013:00:32:57.354000
3 99.98.145.45 03JAN2014:13:53:06.453000
4 99.98.145.45 03JAN2014:14:10:51.439000
5 99.98.145.45 19JAN2014:04:44:20.043000
6 99.98.148.185 15NOV2017:23:42:09.202000
7 99.98.148.185 04DEC2017:22:52:47.562000
8 99.98.148.185 11JAN2018:15:12:00.020000
9 99.98.148.185 11JAN2018:16:13:25.536000
10 99.98.148.185 12JAN2018:16:13:25.536000
11 99.98.148.185 13JAN2018:16:13:25.536000
12 99.98.148.185 14JAN2018:16:13:25.536000
13 99.98.148.185 13MAR2018:20:50:20.437000
14 99.98.171.56 14JUN2016:22:30:45.895000
15 99.98.171.56 07JUL2016:20:17:28.182000
16 99.98.171.56 07JUL2016:21:02:56.512000
17 99.98.237.214 16JUL2012:18:27:58.504000
18 99.98.237.214 17JUL2012:19:50:17.975000
;
proc sql;
create table want as
select *,(select count(IP_Address) from have where Logon_Time
between intnx('dtday',a.Logon_Time,-3) and intnx('dtday',a.Logon_Time,3)) as count
from have as a;
quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.