Hi all
I would need some help in writing a program to automate these steps:
1) Group client according to start & end time (if they overlapped, they will be group together) and the room number
2) Flag out the clients if their total sum of 'Est amount' is +/-$5000 - these could be of any combination within the group
Below is a set of sample data and the outcome i would like to achieve. Thanks.
Original Data
client_id | Est amount | Max Play | Avg. Play | in_date | out_date | Start Time | End Time | timeplayed | Room Number |
---|---|---|---|---|---|---|---|---|---|
P1 | 33000 | 20000 | 5000 | 3/7/2012 | 3/7/2012 | 4:17:00 PM | 6:34:00 PM | 137 | 4444 |
P2 | -30000 | 7000 | 2000 | 3/7/2012 | 3/7/2012 | 4:31:00 PM | 6:16:00 PM | 105 | 4444 |
P1 | -233000 | 30000 | 15000 | 3/7/2012 | 3/7/2012 | 6:34:00 PM | 7:55:00 PM | 81 | 4444 |
P3 | -50000 | 50000 | 50000 | 3/7/2012 | 3/7/2012 | 6:59:00 PM | 7:00:00 PM | 1 | 4444 |
P1 | -47000 | 20000 | 10000 | 3/7/2012 | 3/7/2012 | 9:05:00 PM | 9:48:00 PM | 43 | 6789 |
P2 | 42000 | 25000 | 10000 | 3/7/2012 | 3/7/2012 | 9:24:00 PM | 10:49:00 PM | 85 | 6789 |
P4 | 8000 | 21000 | 8000 | 3/7/2012 | 3/7/2012 | 9:38:00 PM | 10:55:00 PM | 77 | 6789 |
Step 1
client_id | Est amount | Max Play | Avg. Play | in_date | out_date | Start Time | End Time | timeplayed | Room Number |
---|---|---|---|---|---|---|---|---|---|
Group 1 | |||||||||
P1 | 33000 | 20000 | 5000 | 3/7/2012 | 3/7/2012 | 4:17:00 PM | 6:34:00 PM | 137 | 4444 |
P2 | -30000 | 7000 | 2000 | 3/7/2012 | 3/7/2012 | 4:31:00 PM | 6:16:00 PM | 105 | 4444 |
Group 2 | |||||||||
P1 | -233000 | 30000 | 15000 | 3/7/2012 | 3/7/2012 | 6:34:00 PM | 7:55:00 PM | 81 | 4444 |
P3 | -50000 | 50000 | 50000 | 3/7/2012 | 3/7/2012 | 6:59:00 PM | 7:00:00 PM | 1 | 4444 |
Group 3 | |||||||||
P1 | -47000 | 20000 | 10000 | 3/7/2012 | 3/7/2012 | 9:05:00 PM | 9:48:00 PM | 43 | 6789 |
P2 | 42000 | 25000 | 10000 | 3/7/2012 | 3/7/2012 | 9:24:00 PM | 10:49:00 PM | 85 | 6789 |
P4 | 8000 | 21000 | 8000 | 3/7/2012 | 3/7/2012 | 9:38:00 PM | 10:55:00 PM | 77 | 6789 |
Step 2
Combinations | Flagged |
---|---|
Group 1 | |
P1 & P2 | P1, P2 |
Group 2 | |
P1 & P3 | |
Group 3 | |
P1 & P2 | P1 & P2 |
P1 & P4 | |
P1 & P2 & P4 | P1 &P2 & P4 |
"Group client according to start & end time (if they overlapped, they will be group together) and the room number"
Your post is obscure . where is start & end time and room number .
What is the meaning of "+/-$5000" ?
Hi Ksharp
Thank you for the response. The Start time and end time is listed on the original data (column 7 and 8 respectively). Room number is on the last column. You might have to scroll to the right to see the figures.
I would like to flag out the combination where the sum of their "Est amount" (column 2) is within the range of + or - $5000.
Step 1 and 2 show the output that i would like to achieve.
data x;
input client_id $ Est_Amount Max_Play Avg_Play in_date anydtdte. out_date anydtdte.
Start_Time time11. End_Time time12. timeplayed Room_Number ;
IN_Datetime = in_date * 86400 + Start_Time;
Out_Datetime = out_date * 86400 + End_Time;
format IN_Datetime Out_Datetime datetime20. In_Date Out_Date Date9. Start_Time End_Time time8.;
datalines;
P1 33000 20000 5000 3/7/2012 3/7/2012 04:17:00 PM 06:34:00 PM 137 4444
P2 -30000 7000 2000 3/7/2012 3/7/2012 04:31:00 PM 06:16:00 PM 105 4444
P1 -233000 30000 15000 3/7/2012 3/7/2012 06:34:00 PM 07:55:00 PM 81 4444
P3 -50000 50000 50000 3/7/2012 3/7/2012 06:59:00 PM 07:00:00 PM 1 4444
P1 -47000 20000 10000 3/7/2012 3/7/2012 09:05:00 PM 09:48:00 PM 43 6789
P2 42000 25000 10000 3/7/2012 3/7/2012 09:24:00 PM 10:49:00 PM 85 6789
P4 8000 21000 8000 3/7/2012 3/7/2012 09:38:00 PM 10:55:00 PM 77 6789
;
run;
data want1;
if _n_ = 1 then GroupId = 0;
retain GroupId;
set x;
by Room_Number notsorted;
Lag_Out_Datetime = LAG(Out_Datetime);
if first.Room_Number or Lag_Out_Datetime < In_Datetime then do;
GroupId + 1;
end;
drop Out_Datetime In_Datetime Lag:;
run;
proc sql noprint;
select max(Freq) into :maxfreq
from (
select GroupId, Count(1) as Freq from want1 group by GroupId);
quit;
data want2;
array p[1:&maxfreq] $3;
array total[1:&maxfreq];
do k=1 to &maxfreq;
p
total
end;
n = 0;
if _n_ = 1 then do;
declare hash h(ordered:'a');
h.defineKey('client_id');
h.defineData('client_id');
h.defineDone();
declare hiter hi('h');
end;
do until(last.GroupId);
set want1;
by GroupId;
n + 1;
p
total
end;
do k=2 to n;
ncomb = comb(n,k);
do i=1 to ncomb;
rc=allcomb(i, k, of p
rc=allcomb(i, k, of total
tot = 0;
length Combinations Flagged $200;
h.clear();
Combinations = '';
do j=1 to k;
tot = SUM(tot,total
client_id = p
h.add();
end;
rc = hi.first();
do while(rc = 0);
Combinations = CATX(',',Combinations,client_id);
rc = hi.next();
end;
tot_flag = (ABS(tot) <= 5000);
if tot_flag ne 1 then Flagged = '';
else Flagged = Combinations;
output;
end;
end;
keep GroupId Combinations Flagged;
run;
Do you consider to use In_date and Out_date as grouping variable too ?
If you only consider " start & end time and the room number" ,that might be meet a error .
Anyway , Anyhow , that is really not easy for all the combination.
data x; input client_id $ Est_Amount Max_Play Avg_Play in_date : mmddyy10. out_date : mmddyy10. Start_Time & time11. End_Time & time12. timeplayed Room_Number ; format In_Date Out_Date Date9. Start_Time End_Time time8.; datalines; P1 33000 20000 5000 3/7/2012 3/7/2012 04:17:00 PM 06:34:00 PM 137 4444 P2 -30000 7000 2000 3/7/2012 3/7/2012 04:31:00 PM 06:16:00 PM 105 4444 P1 -233000 30000 15000 3/7/2012 3/7/2012 06:34:00 PM 07:55:00 PM 81 4444 P3 -50000 50000 50000 3/7/2012 3/7/2012 06:59:00 PM 07:00:00 PM 1 4444 P1 -47000 20000 10000 3/7/2012 3/7/2012 09:05:00 PM 09:48:00 PM 43 6789 P2 42000 25000 10000 3/7/2012 3/7/2012 09:24:00 PM 10:49:00 PM 85 6789 P4 8000 21000 8000 3/7/2012 3/7/2012 09:38:00 PM 10:55:00 PM 77 6789 ; run; data one; set x ; if (Room_Number=lag(Room_Number) and lag(End_Time) lt Start_Time) or (Room_Number ne lag(Room_Number)) then group+1; run; proc transpose data=one out=temp(drop=_name_) ; by group; id client_id; var Est_Amount; run; data two(keep=group var); set temp; length var $ 200; array x{*} p: ; n=dim(x); do k=2 to n; ncomb=comb(n,k); do j=1 to ncomb; call allcomb(j, k, of x{*}); flag=0;sum=0;var=' '; do i=1 to k; if missing(x{i}) then flag=1; end; if not flag then do; do i=1 to k; sum+x{i}; end; if abs(sum) le 5000 then do; do i=1 to k; var=catx(',',var,x{i}); end; output; end; end; end; end; run; data two(keep=group var _var); merge two(in=a) temp; by group; length _var $ 200; array x{*} p: ; if a then do; do j=1 to countw(var,','); temp=input(scan(var,j,','),best32.); do i=1 to dim(x); if temp eq x{i} and not findw(_var,strip(vname(x{i}))) then _var=catx(',',_var,vname(x{i})); end; end; output; end; run;
Ksharp
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.