I would like to count the occurrences of an IP address over a rolling three day time frame. Below is sample data.
ID_Number | IP_Address | Logon_Time |
1 | 99.98.127.208 | 06DEC2017:10:55:01.855000 |
2 | 99.98.145.45 | 12DEC2013:00:32:57.354000 |
3 | 99.98.145.45 | 03JAN2014:13:53:06.453000 |
4 | 99.98.145.45 | 03JAN2014:14:10:51.439000 |
5 | 99.98.145.45 | 19JAN2014:04:44:20.043000 |
6 | 99.98.148.185 | 15NOV2017:23:42:09.202000 |
7 | 99.98.148.185 | 04DEC2017:22:52:47.562000 |
8 | 99.98.148.185 | 11JAN2018:15:12:00.020000 |
9 | 99.98.148.185 | 11JAN2018:16:13:25.536000 |
10 | 99.98.148.185 | 12JAN2018:16:13:25.536000 |
11 | 99.98.148.185 | 13JAN2018:16:13:25.536000 |
12 | 99.98.148.185 | 14JAN2018:16:13:25.536000 |
13 | 99.98.148.185 | 13MAR2018:20:50:20.437000 |
14 | 99.98.171.56 | 14JUN2016:22:30:45.895000 |
15 | 99.98.171.56 | 07JUL2016:20:17:28.182000 |
16 | 99.98.171.56 | 07JUL2016:21:02:56.512000 |
17 | 99.98.237.214 | 16JUL2012:18:27:58.504000 |
18 | 99.98.237.214 | 17JUL2012:19:50:17.975000 |
The output should look like the following:
ID_Number | IP_Address | Logon_Time | Count |
1 | 99.98.127.208 | 06DEC2017:10:55:01.855000 | 1 |
2 | 99.98.145.45 | 12DEC2013:00:32:57.354000 | 1 |
3 | 99.98.145.45 | 03JAN2014:13:53:06.453000 | 2 |
4 | 99.98.145.45 | 03JAN2014:14:10:51.439000 | 2 |
5 | 99.98.145.45 | 19JAN2014:04:44:20.043000 | 1 |
6 | 99.98.148.185 | 15NOV2017:23:42:09.202000 | 1 |
7 | 99.98.148.185 | 04DEC2017:22:52:47.562000 | 1 |
8 | 99.98.148.185 | 11JAN2018:15:12:00.020000 | 4 |
9 | 99.98.148.185 | 11JAN2018:16:13:25.536000 | 4 |
10 | 99.98.148.185 | 12JAN2018:16:13:25.536000 | 5 |
11 | 99.98.148.185 | 13JAN2018:16:13:25.536000 | 5 |
12 | 99.98.148.185 | 14JAN2018:16:13:25.536000 | 3 |
13 | 99.98.148.185 | 13MAR2018:20:50:20.437000 | 1 |
14 | 99.98.171.56 | 14JUN2016:22:30:45.895000 | 1 |
15 | 99.98.171.56 | 07JUL2016:20:17:28.182000 | 2 |
16 | 99.98.171.56 | 07JUL2016:21:02:56.512000 | 2 |
17 | 99.98.237.214 | 16JUL2012:18:27:58.504000 | 2 |
18 | 99.98.237.214 | 17JUL2012:19:50:17.975000 | 2 |
The code should be able to do a count of both 3 days ahead and 3 days behind. See ID_Numbers 8-12 above.
data have;
infile datalines truncover;
input (ID_Number IP_Address)(:$50.) Logon_Time : datetime30.;
format Logon_Time datetime30.;
datalines;
1 99.98.127.208 06DEC2017:10:55:01.855000
2 99.98.145.45 12DEC2013:00:32:57.354000
3 99.98.145.45 03JAN2014:13:53:06.453000
4 99.98.145.45 03JAN2014:14:10:51.439000
5 99.98.145.45 19JAN2014:04:44:20.043000
6 99.98.148.185 15NOV2017:23:42:09.202000
7 99.98.148.185 04DEC2017:22:52:47.562000
8 99.98.148.185 11JAN2018:15:12:00.020000
9 99.98.148.185 11JAN2018:16:13:25.536000
10 99.98.148.185 12JAN2018:16:13:25.536000
11 99.98.148.185 13JAN2018:16:13:25.536000
12 99.98.148.185 14JAN2018:16:13:25.536000
13 99.98.148.185 13MAR2018:20:50:20.437000
14 99.98.171.56 14JUN2016:22:30:45.895000
15 99.98.171.56 07JUL2016:20:17:28.182000
16 99.98.171.56 07JUL2016:21:02:56.512000
17 99.98.237.214 16JUL2012:18:27:58.504000
18 99.98.237.214 17JUL2012:19:50:17.975000
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H (dataset:'have',multidata:'y') ;
h.definekey ("IP_Address") ;
h.definedata ("Logon_Time") ;
h.definedone () ;
end;
set have(rename=(Logon_Time=_Logon_Time));
count=0;
rc = h.find();
do while(rc = 0);
if intnx('days',datepart(_Logon_Time),-3+1)<=datepart(Logon_Time)<=intnx('days',datepart(_Logon_Time),+3-1) then count+1;
rc = h.find_next();
end;
drop rc Logon_Time;
run;
Is your time frame three days ahead then? Three days back or a plus-minus 3 day window on either side?
For a record today, would the interval be from April 17 to April 23 inclusive or would it be April 18 to April 22 inclusive?
Do you have a SAS /ETS license?
@Sth19 wrote:
I would like to count the occurrences of an IP address over a rolling three day time frame. Below is sample data.
ID_Number IP_Address Logon_Time 1 99.98.127.208 06DEC2017:10:55:01.855000 2 99.98.145.45 12DEC2013:00:32:57.354000 3 99.98.145.45 03JAN2014:13:53:06.453000 4 99.98.145.45 03JAN2014:14:10:51.439000 5 99.98.145.45 19JAN2014:04:44:20.043000 6 99.98.148.185 15NOV2017:23:42:09.202000 7 99.98.148.185 04DEC2017:22:52:47.562000 8 99.98.148.185 11JAN2018:15:12:00.020000 9 99.98.148.185 11JAN2018:16:13:25.536000 10 99.98.148.185 12JAN2018:16:13:25.536000 11 99.98.148.185 13JAN2018:16:13:25.536000 12 99.98.148.185 14JAN2018:16:13:25.536000 13 99.98.148.185 13MAR2018:20:50:20.437000 14 99.98.171.56 14JUN2016:22:30:45.895000 15 99.98.171.56 07JUL2016:20:17:28.182000 16 99.98.171.56 07JUL2016:21:02:56.512000 17 99.98.237.214 16JUL2012:18:27:58.504000 18 99.98.237.214 17JUL2012:19:50:17.975000
The output should look like the following:
ID_Number IP_Address Logon_Time Count 1 99.98.127.208 06DEC2017:10:55:01.855000 1 2 99.98.145.45 12DEC2013:00:32:57.354000 1 3 99.98.145.45 03JAN2014:13:53:06.453000 2 4 99.98.145.45 03JAN2014:14:10:51.439000 2 5 99.98.145.45 19JAN2014:04:44:20.043000 1 6 99.98.148.185 15NOV2017:23:42:09.202000 1 7 99.98.148.185 04DEC2017:22:52:47.562000 1 8 99.98.148.185 11JAN2018:15:12:00.020000 4 9 99.98.148.185 11JAN2018:16:13:25.536000 4 10 99.98.148.185 12JAN2018:16:13:25.536000 5 11 99.98.148.185 13JAN2018:16:13:25.536000 5 12 99.98.148.185 14JAN2018:16:13:25.536000 3 13 99.98.148.185 13MAR2018:20:50:20.437000 1 14 99.98.171.56 14JUN2016:22:30:45.895000 1 15 99.98.171.56 07JUL2016:20:17:28.182000 2 16 99.98.171.56 07JUL2016:21:02:56.512000 2 17 99.98.237.214 16JUL2012:18:27:58.504000 2 18 99.98.237.214 17JUL2012:19:50:17.975000 2
The code should be able to do a count of both 3 days ahead and 3 days behind. See ID_Numbers 8-12 above.
data have;
infile datalines truncover;
input (ID_Number IP_Address)(:$50.) Logon_Time : datetime30.;
format Logon_Time datetime30.;
datalines;
1 99.98.127.208 06DEC2017:10:55:01.855000
2 99.98.145.45 12DEC2013:00:32:57.354000
3 99.98.145.45 03JAN2014:13:53:06.453000
4 99.98.145.45 03JAN2014:14:10:51.439000
5 99.98.145.45 19JAN2014:04:44:20.043000
6 99.98.148.185 15NOV2017:23:42:09.202000
7 99.98.148.185 04DEC2017:22:52:47.562000
8 99.98.148.185 11JAN2018:15:12:00.020000
9 99.98.148.185 11JAN2018:16:13:25.536000
10 99.98.148.185 12JAN2018:16:13:25.536000
11 99.98.148.185 13JAN2018:16:13:25.536000
12 99.98.148.185 14JAN2018:16:13:25.536000
13 99.98.148.185 13MAR2018:20:50:20.437000
14 99.98.171.56 14JUN2016:22:30:45.895000
15 99.98.171.56 07JUL2016:20:17:28.182000
16 99.98.171.56 07JUL2016:21:02:56.512000
17 99.98.237.214 16JUL2012:18:27:58.504000
18 99.98.237.214 17JUL2012:19:50:17.975000
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H (dataset:'have',multidata:'y') ;
h.definekey ("IP_Address") ;
h.definedata ("Logon_Time") ;
h.definedone () ;
end;
set have(rename=(Logon_Time=_Logon_Time));
count=0;
rc = h.find();
do while(rc = 0);
if intnx('days',datepart(_Logon_Time),-3+1)<=datepart(Logon_Time)<=intnx('days',datepart(_Logon_Time),+3-1) then count+1;
rc = h.find_next();
end;
drop rc Logon_Time;
run;
For record 12, you show a count of 3 in the desired results.
Should that be 4? I think records 9, 10, 11, and 12 all fall within the 3 day time frame, but I may be wrong. Record 8 is too far back and Record 13 is too far forward. It may be an < vs <= thing.
data have;
infile datalines truncover;
input (ID_Number IP_Address)(:$50.) Logon_Time : datetime30.;
format Logon_Time datetime30.;
datalines;
1 99.98.127.208 06DEC2017:10:55:01.855000
2 99.98.145.45 12DEC2013:00:32:57.354000
3 99.98.145.45 03JAN2014:13:53:06.453000
4 99.98.145.45 03JAN2014:14:10:51.439000
5 99.98.145.45 19JAN2014:04:44:20.043000
6 99.98.148.185 15NOV2017:23:42:09.202000
7 99.98.148.185 04DEC2017:22:52:47.562000
8 99.98.148.185 11JAN2018:15:12:00.020000
9 99.98.148.185 11JAN2018:16:13:25.536000
10 99.98.148.185 12JAN2018:16:13:25.536000
11 99.98.148.185 13JAN2018:16:13:25.536000
12 99.98.148.185 14JAN2018:16:13:25.536000
13 99.98.148.185 13MAR2018:20:50:20.437000
14 99.98.171.56 14JUN2016:22:30:45.895000
15 99.98.171.56 07JUL2016:20:17:28.182000
16 99.98.171.56 07JUL2016:21:02:56.512000
17 99.98.237.214 16JUL2012:18:27:58.504000
18 99.98.237.214 17JUL2012:19:50:17.975000
;
proc sql;
create table want as
select *,(select count(IP_Address) from have where Logon_Time
between intnx('dtday',a.Logon_Time,-3) and intnx('dtday',a.Logon_Time,3)) as count
from have as a;
quit;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.