Hi all
I would need some help in writing a program to automate these steps:
1) Group client according to start & end time (if they overlapped, they will be group together) and the room number
2) Flag out the clients if their total sum of 'Est amount' is +/-$5000 - these could be of any combination within the group
Below is a set of sample data and the outcome i would like to achieve. Thanks.
Original Data
| client_id | Est amount | Max Play | Avg. Play | in_date | out_date | Start Time | End Time | timeplayed | Room Number |
|---|---|---|---|---|---|---|---|---|---|
| P1 | 33000 | 20000 | 5000 | 3/7/2012 | 3/7/2012 | 4:17:00 PM | 6:34:00 PM | 137 | 4444 |
| P2 | -30000 | 7000 | 2000 | 3/7/2012 | 3/7/2012 | 4:31:00 PM | 6:16:00 PM | 105 | 4444 |
| P1 | -233000 | 30000 | 15000 | 3/7/2012 | 3/7/2012 | 6:34:00 PM | 7:55:00 PM | 81 | 4444 |
| P3 | -50000 | 50000 | 50000 | 3/7/2012 | 3/7/2012 | 6:59:00 PM | 7:00:00 PM | 1 | 4444 |
| P1 | -47000 | 20000 | 10000 | 3/7/2012 | 3/7/2012 | 9:05:00 PM | 9:48:00 PM | 43 | 6789 |
| P2 | 42000 | 25000 | 10000 | 3/7/2012 | 3/7/2012 | 9:24:00 PM | 10:49:00 PM | 85 | 6789 |
| P4 | 8000 | 21000 | 8000 | 3/7/2012 | 3/7/2012 | 9:38:00 PM | 10:55:00 PM | 77 | 6789 |
Step 1
| client_id | Est amount | Max Play | Avg. Play | in_date | out_date | Start Time | End Time | timeplayed | Room Number |
|---|---|---|---|---|---|---|---|---|---|
| Group 1 | |||||||||
| P1 | 33000 | 20000 | 5000 | 3/7/2012 | 3/7/2012 | 4:17:00 PM | 6:34:00 PM | 137 | 4444 |
| P2 | -30000 | 7000 | 2000 | 3/7/2012 | 3/7/2012 | 4:31:00 PM | 6:16:00 PM | 105 | 4444 |
| Group 2 | |||||||||
| P1 | -233000 | 30000 | 15000 | 3/7/2012 | 3/7/2012 | 6:34:00 PM | 7:55:00 PM | 81 | 4444 |
| P3 | -50000 | 50000 | 50000 | 3/7/2012 | 3/7/2012 | 6:59:00 PM | 7:00:00 PM | 1 | 4444 |
Group 3 | |||||||||
| P1 | -47000 | 20000 | 10000 | 3/7/2012 | 3/7/2012 | 9:05:00 PM | 9:48:00 PM | 43 | 6789 |
| P2 | 42000 | 25000 | 10000 | 3/7/2012 | 3/7/2012 | 9:24:00 PM | 10:49:00 PM | 85 | 6789 |
| P4 | 8000 | 21000 | 8000 | 3/7/2012 | 3/7/2012 | 9:38:00 PM | 10:55:00 PM | 77 | 6789 |
Step 2
| Combinations | Flagged |
|---|---|
| Group 1 | |
| P1 & P2 | P1, P2 |
| Group 2 | |
| P1 & P3 | |
Group 3 | |
| P1 & P2 | P1 & P2 |
| P1 & P4 | |
| P1 & P2 & P4 | P1 &P2 & P4 |
"Group client according to start & end time (if they overlapped, they will be group together) and the room number"
Your post is obscure . where is start & end time and room number .
What is the meaning of "+/-$5000" ?
Hi Ksharp
Thank you for the response. The Start time and end time is listed on the original data (column 7 and 8 respectively). Room number is on the last column. You might have to scroll to the right to see the figures.
I would like to flag out the combination where the sum of their "Est amount" (column 2) is within the range of + or - $5000.
Step 1 and 2 show the output that i would like to achieve.
data x;
input client_id $ Est_Amount Max_Play Avg_Play in_date anydtdte. out_date anydtdte.
Start_Time time11. End_Time time12. timeplayed Room_Number ;
IN_Datetime = in_date * 86400 + Start_Time;
Out_Datetime = out_date * 86400 + End_Time;
format IN_Datetime Out_Datetime datetime20. In_Date Out_Date Date9. Start_Time End_Time time8.;
datalines;
P1 33000 20000 5000 3/7/2012 3/7/2012 04:17:00 PM 06:34:00 PM 137 4444
P2 -30000 7000 2000 3/7/2012 3/7/2012 04:31:00 PM 06:16:00 PM 105 4444
P1 -233000 30000 15000 3/7/2012 3/7/2012 06:34:00 PM 07:55:00 PM 81 4444
P3 -50000 50000 50000 3/7/2012 3/7/2012 06:59:00 PM 07:00:00 PM 1 4444
P1 -47000 20000 10000 3/7/2012 3/7/2012 09:05:00 PM 09:48:00 PM 43 6789
P2 42000 25000 10000 3/7/2012 3/7/2012 09:24:00 PM 10:49:00 PM 85 6789
P4 8000 21000 8000 3/7/2012 3/7/2012 09:38:00 PM 10:55:00 PM 77 6789
;
run;
data want1;
if _n_ = 1 then GroupId = 0;
retain GroupId;
set x;
by Room_Number notsorted;
Lag_Out_Datetime = LAG(Out_Datetime);
if first.Room_Number or Lag_Out_Datetime < In_Datetime then do;
GroupId + 1;
end;
drop Out_Datetime In_Datetime Lag:;
run;
proc sql noprint;
select max(Freq) into :maxfreq
from (
select GroupId, Count(1) as Freq from want1 group by GroupId);
quit;
data want2;
array p[1:&maxfreq] $3;
array total[1:&maxfreq];
do k=1 to &maxfreq;
p
total
end;
n = 0;
if _n_ = 1 then do;
declare hash h(ordered:'a');
h.defineKey('client_id');
h.defineData('client_id');
h.defineDone();
declare hiter hi('h');
end;
do until(last.GroupId);
set want1;
by GroupId;
n + 1;
p
total
end;
do k=2 to n;
ncomb = comb(n,k);
do i=1 to ncomb;
rc=allcomb(i, k, of p
rc=allcomb(i, k, of total
tot = 0;
length Combinations Flagged $200;
h.clear();
Combinations = '';
do j=1 to k;
tot = SUM(tot,total
client_id = p
h.add();
end;
rc = hi.first();
do while(rc = 0);
Combinations = CATX(',',Combinations,client_id);
rc = hi.next();
end;
tot_flag = (ABS(tot) <= 5000);
if tot_flag ne 1 then Flagged = '';
else Flagged = Combinations;
output;
end;
end;
keep GroupId Combinations Flagged;
run;
Do you consider to use In_date and Out_date as grouping variable too ?
If you only consider " start & end time and the room number" ,that might be meet a error .
Anyway , Anyhow , that is really not easy for all the combination.
data x;
input client_id $ Est_Amount Max_Play Avg_Play in_date : mmddyy10. out_date : mmddyy10.
Start_Time & time11. End_Time & time12. timeplayed Room_Number ;
format In_Date Out_Date Date9. Start_Time End_Time time8.;
datalines;
P1 33000 20000 5000 3/7/2012 3/7/2012 04:17:00 PM 06:34:00 PM 137 4444
P2 -30000 7000 2000 3/7/2012 3/7/2012 04:31:00 PM 06:16:00 PM 105 4444
P1 -233000 30000 15000 3/7/2012 3/7/2012 06:34:00 PM 07:55:00 PM 81 4444
P3 -50000 50000 50000 3/7/2012 3/7/2012 06:59:00 PM 07:00:00 PM 1 4444
P1 -47000 20000 10000 3/7/2012 3/7/2012 09:05:00 PM 09:48:00 PM 43 6789
P2 42000 25000 10000 3/7/2012 3/7/2012 09:24:00 PM 10:49:00 PM 85 6789
P4 8000 21000 8000 3/7/2012 3/7/2012 09:38:00 PM 10:55:00 PM 77 6789
;
run;
data one;
set x ;
if (Room_Number=lag(Room_Number) and lag(End_Time) lt Start_Time) or (Room_Number ne lag(Room_Number)) then group+1;
run;
proc transpose data=one out=temp(drop=_name_) ;
by group;
id client_id;
var Est_Amount;
run;
data two(keep=group var);
set temp;
length var $ 200;
array x{*} p: ;
n=dim(x);
do k=2 to n;
ncomb=comb(n,k);
do j=1 to ncomb;
call allcomb(j, k, of x{*});
flag=0;sum=0;var=' ';
do i=1 to k;
if missing(x{i}) then flag=1;
end;
if not flag then do;
do i=1 to k;
sum+x{i};
end;
if abs(sum) le 5000 then do;
do i=1 to k;
var=catx(',',var,x{i});
end;
output;
end;
end;
end;
end;
run;
data two(keep=group var _var);
merge two(in=a) temp;
by group;
length _var $ 200;
array x{*} p: ;
if a then do;
do j=1 to countw(var,',');
temp=input(scan(var,j,','),best32.);
do i=1 to dim(x);
if temp eq x{i} and not findw(_var,strip(vname(x{i}))) then _var=catx(',',_var,vname(x{i}));
end;
end;
output;
end;
run;
Ksharp
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.