DATA Step, Macro, Functions and more

Extract Data Having # or Sum more than certain threshold, within certain Time Range.

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Extract Data Having # or Sum more than certain threshold, within certain Time Range.

Hi Guys,

I would need your help on the problem that I'm currently into...

Basically I have the data below.

I would like to extract the data having criteria:

  • (# of data occurences under same Group >= 2 OR total dollar under same Group >= 25) , AND
  • time range is less than 3 hours

Please note:

  • field "delta time" in "data wants" is row N - row N-1 on field "time" under same Group. For example: "delta time" = 5:51 in row 2 is coming from "time" in row 2 substracted by time in row 1.
  • sample remarks:
    • row 4 & 5 got remarks Yes because within 3 hrs (delta time for both rows is 1:43) and total "dollar" is 32 (16+18) or total count is 2 (count for row 4 & 5).
    • row 9,10,11 got remarks Yes because within 3 hrs (sum delta time from 9-10 & 10-11 is 2 minutes, coming from 0:01+0:01) and total count is 3 (count for row 9,10,11).


Thank you in advance

Data Have

:

rowGroupdollartime
1a102:43 AM
2a108:34 AM
3a1111:52 AM
4b165:32 AM
5b187:16 AM
6b122:08 PM
7b118:34 PM
8b139:31 PM
9c112:18 AM
10c52:19 AM
11c52:20 AM
12c128:34 AM
13d182:12 PM
14e192:43 AM
15e175:30 AM
16e1112:07 PM
17e148:40 PM
18e168:46 PM
19f118:24 AM
20f1211:55 AM

Data Wants:

rowGroupdollartimedelta timeRemarks
1a102:43 AM
2a108:34 AM5:51
3a1111:52 AM3:18
4b165:32 AM Yes
5b187:16 AM1:43Yes
6b122:08 PM6:51
7b118:34 PM6:26
8b139:31 PM0:57
9c112:18 AM Yes
10c52:19 AM0:01Yes
11c52:20 AM0:01Yes
12c128:34 AM6:14
13d182:12 PM5:38
14e192:43 AM Yes
15e175:30 AM2:47Yes
16e1112:07 PM6:37
17e148:40 PM8:32Yes
18e168:46 PM0:05Yes
19f118:24 AM
20f1211:55 AM3:31

Message was edited by: feira h --> to provide better explanation.


Accepted Solutions
Solution
‎03-16-2015 01:35 AM
Super User
Posts: 10,041

Re: Extract Data Having # or Sum more than certain threshold, within certain Time Range.

Posted in reply to feira_feira

So you want all of them ?

time      dif

...............

08:00    04:20  YES

09:00    01:00  YES

11:30    02:30  YES

data have;
input Group     $ dollar     time24hrs : time.;
format      time24hrs time.;
cards;
a     10     2:43
a     10     8:34
a     11     11:52
b     16     5:32
b     18     7:16
b     12     14:08
b     11     20:34
b     13     21:31
c     11     2:18
c     12     8:34
d     18     14:12
e     19     2:43
e     17     5:30
e     11     12:07
e     14     20:40
e     16     20:46
f     11     8:24
f     12     11:55
;
run;
proc sql;
 create table want as
  select *,case when (select count(*) from have where Group=a.Group and time24hrs between a.time24hrs-3*60*60 and a.time24hrs+3*60*60) ge 2 or
                     (select sum(dollar) from have where Group=a.Group and time24hrs between a.time24hrs-3*60*60 and a.time24hrs+3*60*60) ge 25 
                          then 'YES'
                    else 'NO ' end as falg
   from have as a;
quit;

Xia Keshan

View solution in original post


All Replies
Super User
Posts: 10,041

Re: Extract Data Having # or Sum more than certain threshold, within certain Time Range.

Posted in reply to feira_feira

You didn't post the output you want yet .

data have;
input Group     $ dollar     time24hrs : time.;
format      time24hrs time.;
cards;
a     10     2:43
a     10     8:34
a     11     11:52
b     16     5:32
b     18     7:16
b     12     14:08
b     11     20:34
b     13     21:31
c     11     2:18
c     12     8:34
d     18     14:12
e     19     2:43
e     17     5:30
e     11     12:07
e     14     20:40
e     16     20:46
f     11     8:24
f     12     11:55
;
run;
proc sql;
 create table want as
  select *
   from have
    group by group
      having range(time24hrs) le 30*60 and (count(*) ge 3 or sum(dollar) ge 25);
quit;

Xia Keshan

Super User
Posts: 10,041

Re: Extract Data Having # or Sum more than certain threshold, within certain Time Range.

Posted in reply to feira_feira

You didn't post the output you want yet .

data have;
input Group     $ dollar     time24hrs : time.;
format      time24hrs time.;
cards;
a     10     2:43
a     10     8:34
a     11     11:52
b     16     5:32
b     18     7:16
b     12     14:08
b     11     20:34
b     13     21:31
c     11     2:18
c     12     8:34
d     18     14:12
e     19     2:43
e     17     5:30
e     11     12:07
e     14     20:40
e     16     20:46
f     11     8:24
f     12     11:55
;
run;
proc sql;
 create table want as
  select *
   from have
    group by group
      having range(time24hrs) le 30*60 and (count(*) ge 3 or sum(dollar) ge 25);
quit;

Xia Keshan

Super User
Posts: 10,041

Re: Extract Data Having # or Sum more than certain threshold, within certain Time Range.

Posted in reply to feira_feira

It is more complicated than I imaged . What you gonna do if you have data :

time      dif

...............

08:00    04:20

09:00    01:00

11:30    02:30

You want

08:00    04:20   YES

09:00    01:00   YES

or want ?

09:00    01:00  YES

11:30    02:30  YES

New Contributor
Posts: 3

Re: Extract Data Having # or Sum more than certain threshold, within certain Time Range.

Hi Xia,

Assuming the criteria of total dollar >=25 are met, then the output should be all of them, because:

  • no 1 and 2 data are < 3 hrs. & count >= 2
  • no 2 and 3 data are also < 3 hrs & count >= 2

Thanks for your assistance...

Solution
‎03-16-2015 01:35 AM
Super User
Posts: 10,041

Re: Extract Data Having # or Sum more than certain threshold, within certain Time Range.

Posted in reply to feira_feira

So you want all of them ?

time      dif

...............

08:00    04:20  YES

09:00    01:00  YES

11:30    02:30  YES

data have;
input Group     $ dollar     time24hrs : time.;
format      time24hrs time.;
cards;
a     10     2:43
a     10     8:34
a     11     11:52
b     16     5:32
b     18     7:16
b     12     14:08
b     11     20:34
b     13     21:31
c     11     2:18
c     12     8:34
d     18     14:12
e     19     2:43
e     17     5:30
e     11     12:07
e     14     20:40
e     16     20:46
f     11     8:24
f     12     11:55
;
run;
proc sql;
 create table want as
  select *,case when (select count(*) from have where Group=a.Group and time24hrs between a.time24hrs-3*60*60 and a.time24hrs+3*60*60) ge 2 or
                     (select sum(dollar) from have where Group=a.Group and time24hrs between a.time24hrs-3*60*60 and a.time24hrs+3*60*60) ge 25 
                          then 'YES'
                    else 'NO ' end as falg
   from have as a;
quit;

Xia Keshan

New Contributor
Posts: 3

Re: Extract Data Having # or Sum more than certain threshold, within certain Time Range.

Thanks a lot Xia... it's very helpful... !!!

Frequent Contributor
Posts: 115

Re: Extract Data Having # or Sum more than certain threshold, within certain Time Range.

Posted in reply to feira_feira

Please mark the question as answered and award the correct answers to extraordinary master  Xia Keshan.

I tried to play around with your requirement, but unfortunately couldn;t meet it properly. But for learning purpose, you can play and have fun

data have;
input Group     $ dollar     time24hrs : time.;
*format      time24hrs time.;
cards;
a     10     2:43
a     10     8:34
a     11     11:52
b     16     5:32
b     18     7:16
b     12     14:08
b     11     20:34
b     13     21:31
c     11     2:18
c     12     8:34
d     18     14:12
e     19     2:43
e     17     5:30
e     11     12:07
e     14     20:40
e     16     20:46
f     11     8:24
f     12     11:55
;
run;

data want new(keep=group remarks);
set have;
by group;
k=lag(time24hrs);

if first.group then
  do;
   totaldollar=0;
   grpcount=0;
   sumtimediff=0;
  end;

grpcount+1;
totaldollar+dollar;

if not first.group then
  do;
   deltatime=intck('hour',k,time24hrs);
   sumtimediff+deltatime;
  end;

if grpcount>=2 and totaldollar>=25 and sumtimediff le 3 then
  do;
   remarks='yes';
   output new;
  end;

output want;
run;

data final;
merge want new;
by group;

keep group dollar time24hrs remarks;
run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 352 views
  • 0 likes
  • 3 in conversation