DATA Step, Macro, Functions and more

Count distinct hours in past 3hr

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

Count distinct hours in past 3hr

Hi everyone, I have one more problem. 

I can't get the count of unique(distinct) values to use in hash table calculation.

data have;
input id  opt  $   t_purchase   : datetime32. t_spent :datetime32.  usage ;
format  t_purchase  t_spent datetime32. ;
cards; 
1    a1  10NOV2017:12:02:00  10NOV2017:14:05:00    3  
1    a1  10NOV2017:12:02:00  10NOV2017:15:20:33    0 
1    a1  10NOV2017:12:02:00  10NOV2017:18:24:50    6  
1    a1  10NOV2017:12:02:00  10NOV2017:14:02:58    3 
1    a1  10NOV2017:12:02:00  10NOV2017:20:22:07   12  
1    a1  10NOV2017:12:02:00  10NOV2017:13:57:12    7  
2 b1 10NOV2017:12:55:24 10NOV2017:13:01:42 14
2 b1 10NOV2017:12:55:24 10NOV2017:14:42:25 2
2 b1 10NOV2017:12:55:24 10NOV2017:16:55:32 5
;
run;

 I need to find for every id, opt group (combination) in how many unique hours (column 't_spent') has the record (in past 3 hours). For example, if it's record from 10NOV2017:21:04:33, I need to count a distinct number of hours from 10NOV2017:18:04:33 to 10NOV2017:21:04:33 if id, opt combination has the record. Min what we can get is 1, max is 3. 

I have this code (mostly part is useful and helpful and it's written by @Ksharp, but the part I've changed is not working very well) where I tried to count distinct of column  'hourIy_time' additionally created  hourly_time = t_spent and formatted format hourly_time datetime11. so I could maybe use it to count distinct but this approach is not the best because it can easily catch 4 different hours. Also, this way I discovered that data step does not support DISTINCT function. 

 

data want;
 if _n_=1 then do;
  if 0 then set have(rename=(t_spent=_t_spent));
  declare hash h(dataset:'have(rename=(t_spent=_t_spent))',hashexp:20);
  h.definekey('id','opt', 't_spent');
  h.definedata('_usage');
  h.definedone();
 end;
set have;
count_hours=0;
do i=t_spent-3*3600 to t_spent ;
 if h.find(key:id,key:opt,key:i)=0 then count_hours+_t_spent;
end;
drop _usage i;
run;

Desired output with the desired column:

id   opt     t_purchase           t_spent        usage  count_hours
1 a1 10NOV2017:12:02:00 10NOV2017:14:05:00 3 1 1 a1 10NOV2017:12:02:00 10NOV2017:15:20:33 0 2 1 a1 10NOV2017:12:02:00 10NOV2017:18:24:50 6 1 1 a1 10NOV2017:12:02:00 10NOV2017:14:02:58 3 2 1 a1 10NOV2017:12:02:00 10NOV2017:20:22:07 12 2 1 a1 10NOV2017:12:02:00 10NOV2017:13:57:12 7 1
2 b1 10NOV2017:12:55:24 10NOV2017:13:01:42 14 1
2 b1 10NOV2017:12:55:24 10NOV2017:14:42:25 2 2
2 b1 10NOV2017:12:55:24 10NOV2017:16:55:32 5 3
2 b1 10NOV2017:12:55:24 10NOV2017:15:24:07 3 2

 

 Thanks in advance.


Accepted Solutions
Solution
‎12-14-2017 07:30 AM
Super User
Posts: 2,512

Re: Count distinct hours in past 3hr

Something like this (can't test atm) should be close, please run and alter as needed


proc sql;                
  create table WANT as
  select unique det.*
  , sum(sum.T_SPENT*(3600*0<DIFF<=3600*1))>0 as INTERVAL1H
  , sum(sum.T_SPENT*(3600*1<DIFF<=3600*2))>0 as INTERVAL2H
  , sum(sum.T_SPENT*(3600*2<DIFF<=3600*3))>0 as INTERVAL3H
  , sum(calculated INTERVAL1H,calculated INTERVAL2H,calculated INTERVAL3H) as COUNT_THAT_YOU_WANT
   from HAVE det
       ,DIFF dif
       ,HAVE sum
   where det.ID          =sum.ID 
     and det.OPT         =sum.OPT
     and det.T_SPENT-DIFF=sum.T_SPENT
  group by det.ID , det.OPT, det.T_SPENT  
  order by det.ID , det.OPT, det.T_SPENT ; 
quit;

 

 

View solution in original post


All Replies
Super User
Posts: 10,574

Re: Count distinct hours in past 3hr

Wouldn' that simply be

proc sql;
create table want as
select id, opt, count(distinct(hour(t_spent))) as hour_count
from have
group by id, opt;
quit;

?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 10,850

Re: Count distinct hours in past 3hr

OK. Try this one.

 


data have;
input id  opt  $   t_purchase   : datetime32. t_spent :datetime32.  usage ;
format  t_purchase  t_spent datetime32. ;
cards; 
1    a1  10NOV2017:12:02:00  10NOV2017:14:05:00    3  
1    a1  10NOV2017:12:02:00  10NOV2017:15:20:33    0 
1    a1  10NOV2017:12:02:00  10NOV2017:18:24:50    6  
1    a1  10NOV2017:12:02:00  10NOV2017:14:02:58    3 
1    a1  10NOV2017:12:02:00  10NOV2017:20:22:07   12  
1    a1  10NOV2017:12:02:00  10NOV2017:13:57:12    7  
2    b1  10NOV2017:12:55:24  10NOV2017:13:01:42   14  
2    b1  10NOV2017:12:55:24  10NOV2017:14:42:25    2 
2    b1  10NOV2017:12:55:24  10NOV2017:16:55:32    5 
;
run;

data want;
 if _n_=1 then do;
  if 0 then set have;
  declare hash h(dataset:'have',hashexp:20);
  h.definekey('id','opt', 't_spent');
  h.definedone();

  declare hash h_hour();
  h_hour.definekey('hour');
  h_hour.definedone();
 end;
set have;
do i=t_spent-3*3600 to t_spent ;
 if h.check(key:id,key:opt,key:i)=0 then do;hour=hour(i);h_hour.ref();end;
end;
count_hours=h_hour.num_items;
h_hour.clear();
drop hour i;
run;
Contributor
Posts: 23

Re: Count distinct hours in past 3hr

Thank you very much. This works just as I needed.

I tasted only on small data, when I tried on 5.5m rows, it's working by now about 6 hours. Is it expected? I thought about splitting into more tables, but that would mean that I need to join them after and that's pretty much the same. 

Super User
Posts: 2,512

Re: Count distinct hours in past 3hr

[ Edited ]

This runs in 30 seconds for 10 million records.

 

 


data HAVE;
  input ID  OPT  $   T_PURCHASE   : datetime32. T_SPENT :datetime32.  USAGE ;
  format  T_PURCHASE  T_SPENT datetime32. ;
  do ID=1 to 1e6; output; end;
cards; 
1    a1  10NOV2017:12:02:00  10NOV2017:14:05:00    3  
1    a1  10NOV2017:12:02:00  10NOV2017:15:20:33    0 
1    a1  10NOV2017:12:02:00  10NOV2017:18:24:50    6  
1    a1  10NOV2017:12:02:00  10NOV2017:14:02:58    3 
1    a1  10NOV2017:12:02:00  10NOV2017:20:22:07   12  
1    a1  10NOV2017:12:02:00  10NOV2017:13:57:12    7  
2    b1  10NOV2017:12:55:24  10NOV2017:13:01:42   14  
2    b1  10NOV2017:12:55:24  10NOV2017:14:42:25    2 
2    b1  10NOV2017:12:55:24  10NOV2017:16:55:32    5 
run;

data DIFF;
  do DIFF=0 to 3600*3;
    output; 
  end; 
run;

proc sql;                
  create table WANT as
  select unique det.*, count(sum.T_SPENT) as COUNT
   from HAVE det
       ,DIFF dif
       ,HAVE sum
   where det.ID          =sum.ID 
     and det.OPT         =sum.OPT
     and det.T_SPENT-DIFF=sum.T_SPENT
  group by det.ID , det.OPT, det.T_SPENT  
  order by det.ID , det.OPT, det.T_SPENT ; 
quit;

  

 ID  OPT  T_SPENT  COUNT
1 a1 10NOV2017:13:57:12 1
1 a1 10NOV2017:14:02:58 2
1 a1 10NOV2017:14:05:00 3
1 a1 10NOV2017:15:20:33 4
1 a1 10NOV2017:18:24:50 1
1 a1 10NOV2017:20:22:07 2
1 b1 10NOV2017:13:01:42 1

  

The key to speed is to use equijoins, hence the intermediate table.

 

 

 

Super User
Posts: 10,850

Re: Count distinct hours in past 3hr

You could fast it if sorting the table before .

 


data have;
input id  opt  $   t_purchase   : datetime32. t_spent :datetime32.  usage ;
format  t_purchase  t_spent datetime32. ;
cards; 
1    a1  10NOV2017:12:02:00  10NOV2017:14:05:00    3  
1    a1  10NOV2017:12:02:00  10NOV2017:15:20:33    0 
1    a1  10NOV2017:12:02:00  10NOV2017:18:24:50    6  
1    a1  10NOV2017:12:02:00  10NOV2017:14:02:58    3 
1    a1  10NOV2017:12:02:00  10NOV2017:20:22:07   12  
1    a1  10NOV2017:12:02:00  10NOV2017:13:57:12    7  
2    b1  10NOV2017:12:55:24  10NOV2017:13:01:42   14  
2    b1  10NOV2017:12:55:24  10NOV2017:14:42:25    2 
2    b1  10NOV2017:12:55:24  10NOV2017:16:55:32    5 
;
run;
proc sort data=have;
 by id  opt;
run;
data want;
 if _n_=1 then do;
  if 0 then set have;
  declare hash h();
  h.definekey('id','opt', 't_spent');
  h.definedone();

  declare hash h_hour();
  h_hour.definekey('hour');
  h_hour.definedone();
 end;

do until(last.opt);
 set have;
 by id opt;
 h.ref();
end;

do until(last.opt);
 set have;
 by id opt;
 do i=t_spent-3*3600 to t_spent ;
   if h.check(key:id,key:opt,key:i)=0 then do;hour=hour(i);h_hour.ref();end;
 end;
 count_hours=h_hour.num_items;
 output;
end;
h.clear();
h_hour.clear();
drop hour i;
run;
Super User
Posts: 2,512

Re: Count distinct hours in past 3hr

@Ksharp This is still several orders of magnitude slower than SQL.

Super User
Posts: 10,850

Re: Count distinct hours in past 3hr

@ChrisNZ

Yeah. You are right . It depends on what OP want.

Contributor
Posts: 23

Re: Count distinct hours in past 3hr

[ Edited ]

@ChrisNZ @Ksharp 

I know this was active days ago, but I just found a little problem. 

Both versions give me (mostly) +1 hour that I needed. I know it's a very tricky to solve this because I need to count 3*60 minutes in past from the record and count how many records that combination has in first 60 min, in 120 min, and in 180 min in the past. 

For example, If some combination has the record in 13:40 and 14: 05, for the record from 14:05 count need to be 1, not 2. Record 13:40 was in the past hour (13) but is just 25 minutes before, not the 60+. 

 

Thank you both for your help! 

Super User
Posts: 2,512

Re: Count distinct hours in past 3hr

The count method as given counts 14:05 as the first match and 13:40 as the second match for the record at 14:05, hence you have 2 macthes.

If you don't want to count 14:05, just subtract 1.

Contributor
Posts: 23

Re: Count distinct hours in past 3hr

[ Edited ]

I'm not sure if you have understood the question, I'm sorry for not being so clear.  

In how many hours (in past 3 hours, 180min/3) that combination has the record. Not how many matches(record) has in each hour. 

Maybe someone has just the record in 14:03, 14:05, 14:19, 14:55, 15:02, and that need to be count=1. Because the range is <60min and it does not matter how many records have in that range. If the range is between 0min-60min and 60min-120min it is 2 (example 14:01, 14:03, 14:05, 14:19, 14:55, 15:02) , if it's between 0min-60min, 60min-120min and 120min-180min, that's count=3  (example 14:01, 14:03, 14:05, 14:19, 14:55, 15:02, 16:04), because have records in all the 3 past hours.

Range >180min I don't look at all, need to take observation for each record just the past 3 hours. 

Solution
‎12-14-2017 07:30 AM
Super User
Posts: 2,512

Re: Count distinct hours in past 3hr

Something like this (can't test atm) should be close, please run and alter as needed


proc sql;                
  create table WANT as
  select unique det.*
  , sum(sum.T_SPENT*(3600*0<DIFF<=3600*1))>0 as INTERVAL1H
  , sum(sum.T_SPENT*(3600*1<DIFF<=3600*2))>0 as INTERVAL2H
  , sum(sum.T_SPENT*(3600*2<DIFF<=3600*3))>0 as INTERVAL3H
  , sum(calculated INTERVAL1H,calculated INTERVAL2H,calculated INTERVAL3H) as COUNT_THAT_YOU_WANT
   from HAVE det
       ,DIFF dif
       ,HAVE sum
   where det.ID          =sum.ID 
     and det.OPT         =sum.OPT
     and det.T_SPENT-DIFF=sum.T_SPENT
  group by det.ID , det.OPT, det.T_SPENT  
  order by det.ID , det.OPT, det.T_SPENT ; 
quit;

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 327 views
  • 6 likes
  • 4 in conversation