Hi all
I would need some help in writing a program to automate these steps:
1) Group client according to start & end time (if they overlapped, they will be group together) and the room number
2) Flag out the clients if their total sum of 'Est amount' is +/-$5000 - these could be of any combination within the group
Below is a set of sample data and the outcome i would like to achieve. Thanks.
Original Data
client_id | Est amount | Max Play | Avg. Play | in_date | out_date | Start Time | End Time | timeplayed | Room Number |
---|---|---|---|---|---|---|---|---|---|
P1 | 33000 | 20000 | 5000 | 3/7/2012 | 3/7/2012 | 4:17:00 PM | 6:34:00 PM | 137 | 4444 |
P2 | -30000 | 7000 | 2000 | 3/7/2012 | 3/7/2012 | 4:31:00 PM | 6:16:00 PM | 105 | 4444 |
P1 | -233000 | 30000 | 15000 | 3/7/2012 | 3/7/2012 | 6:34:00 PM | 7:55:00 PM | 81 | 4444 |
P3 | -50000 | 50000 | 50000 | 3/7/2012 | 3/7/2012 | 6:59:00 PM | 7:00:00 PM | 1 | 4444 |
P1 | -47000 | 20000 | 10000 | 3/7/2012 | 3/7/2012 | 9:05:00 PM | 9:48:00 PM | 43 | 6789 |
P2 | 42000 | 25000 | 10000 | 3/7/2012 | 3/7/2012 | 9:24:00 PM | 10:49:00 PM | 85 | 6789 |
P4 | 8000 | 21000 | 8000 | 3/7/2012 | 3/7/2012 | 9:38:00 PM | 10:55:00 PM | 77 | 6789 |
Step 1
client_id | Est amount | Max Play | Avg. Play | in_date | out_date | Start Time | End Time | timeplayed | Room Number |
---|---|---|---|---|---|---|---|---|---|
Group 1 | |||||||||
P1 | 33000 | 20000 | 5000 | 3/7/2012 | 3/7/2012 | 4:17:00 PM | 6:34:00 PM | 137 | 4444 |
P2 | -30000 | 7000 | 2000 | 3/7/2012 | 3/7/2012 | 4:31:00 PM | 6:16:00 PM | 105 | 4444 |
Group 2 | |||||||||
P1 | -233000 | 30000 | 15000 | 3/7/2012 | 3/7/2012 | 6:34:00 PM | 7:55:00 PM | 81 | 4444 |
P3 | -50000 | 50000 | 50000 | 3/7/2012 | 3/7/2012 | 6:59:00 PM | 7:00:00 PM | 1 | 4444 |
Group 3 | |||||||||
P1 | -47000 | 20000 | 10000 | 3/7/2012 | 3/7/2012 | 9:05:00 PM | 9:48:00 PM | 43 | 6789 |
P2 | 42000 | 25000 | 10000 | 3/7/2012 | 3/7/2012 | 9:24:00 PM | 10:49:00 PM | 85 | 6789 |
P4 | 8000 | 21000 | 8000 | 3/7/2012 | 3/7/2012 | 9:38:00 PM | 10:55:00 PM | 77 | 6789 |
Step 2
Combinations | Flagged |
---|---|
Group 1 | |
P1 & P2 | P1, P2 |
Group 2 | |
P1 & P3 | |
Group 3 | |
P1 & P2 | P1 & P2 |
P1 & P4 | |
P1 & P2 & P4 | P1 &P2 & P4 |
"Group client according to start & end time (if they overlapped, they will be group together) and the room number"
Your post is obscure . where is start & end time and room number .
What is the meaning of "+/-$5000" ?
Hi Ksharp
Thank you for the response. The Start time and end time is listed on the original data (column 7 and 8 respectively). Room number is on the last column. You might have to scroll to the right to see the figures.
I would like to flag out the combination where the sum of their "Est amount" (column 2) is within the range of + or - $5000.
Step 1 and 2 show the output that i would like to achieve.
data x;
input client_id $ Est_Amount Max_Play Avg_Play in_date anydtdte. out_date anydtdte.
Start_Time time11. End_Time time12. timeplayed Room_Number ;
IN_Datetime = in_date * 86400 + Start_Time;
Out_Datetime = out_date * 86400 + End_Time;
format IN_Datetime Out_Datetime datetime20. In_Date Out_Date Date9. Start_Time End_Time time8.;
datalines;
P1 33000 20000 5000 3/7/2012 3/7/2012 04:17:00 PM 06:34:00 PM 137 4444
P2 -30000 7000 2000 3/7/2012 3/7/2012 04:31:00 PM 06:16:00 PM 105 4444
P1 -233000 30000 15000 3/7/2012 3/7/2012 06:34:00 PM 07:55:00 PM 81 4444
P3 -50000 50000 50000 3/7/2012 3/7/2012 06:59:00 PM 07:00:00 PM 1 4444
P1 -47000 20000 10000 3/7/2012 3/7/2012 09:05:00 PM 09:48:00 PM 43 6789
P2 42000 25000 10000 3/7/2012 3/7/2012 09:24:00 PM 10:49:00 PM 85 6789
P4 8000 21000 8000 3/7/2012 3/7/2012 09:38:00 PM 10:55:00 PM 77 6789
;
run;
data want1;
if _n_ = 1 then GroupId = 0;
retain GroupId;
set x;
by Room_Number notsorted;
Lag_Out_Datetime = LAG(Out_Datetime);
if first.Room_Number or Lag_Out_Datetime < In_Datetime then do;
GroupId + 1;
end;
drop Out_Datetime In_Datetime Lag:;
run;
proc sql noprint;
select max(Freq) into :maxfreq
from (
select GroupId, Count(1) as Freq from want1 group by GroupId);
quit;
data want2;
array p[1:&maxfreq] $3;
array total[1:&maxfreq];
do k=1 to &maxfreq;
p
total
end;
n = 0;
if _n_ = 1 then do;
declare hash h(ordered:'a');
h.defineKey('client_id');
h.defineData('client_id');
h.defineDone();
declare hiter hi('h');
end;
do until(last.GroupId);
set want1;
by GroupId;
n + 1;
p
total
end;
do k=2 to n;
ncomb = comb(n,k);
do i=1 to ncomb;
rc=allcomb(i, k, of p
rc=allcomb(i, k, of total
tot = 0;
length Combinations Flagged $200;
h.clear();
Combinations = '';
do j=1 to k;
tot = SUM(tot,total
client_id = p
h.add();
end;
rc = hi.first();
do while(rc = 0);
Combinations = CATX(',',Combinations,client_id);
rc = hi.next();
end;
tot_flag = (ABS(tot) <= 5000);
if tot_flag ne 1 then Flagged = '';
else Flagged = Combinations;
output;
end;
end;
keep GroupId Combinations Flagged;
run;
Do you consider to use In_date and Out_date as grouping variable too ?
If you only consider " start & end time and the room number" ,that might be meet a error .
Anyway , Anyhow , that is really not easy for all the combination.
data x; input client_id $ Est_Amount Max_Play Avg_Play in_date : mmddyy10. out_date : mmddyy10. Start_Time & time11. End_Time & time12. timeplayed Room_Number ; format In_Date Out_Date Date9. Start_Time End_Time time8.; datalines; P1 33000 20000 5000 3/7/2012 3/7/2012 04:17:00 PM 06:34:00 PM 137 4444 P2 -30000 7000 2000 3/7/2012 3/7/2012 04:31:00 PM 06:16:00 PM 105 4444 P1 -233000 30000 15000 3/7/2012 3/7/2012 06:34:00 PM 07:55:00 PM 81 4444 P3 -50000 50000 50000 3/7/2012 3/7/2012 06:59:00 PM 07:00:00 PM 1 4444 P1 -47000 20000 10000 3/7/2012 3/7/2012 09:05:00 PM 09:48:00 PM 43 6789 P2 42000 25000 10000 3/7/2012 3/7/2012 09:24:00 PM 10:49:00 PM 85 6789 P4 8000 21000 8000 3/7/2012 3/7/2012 09:38:00 PM 10:55:00 PM 77 6789 ; run; data one; set x ; if (Room_Number=lag(Room_Number) and lag(End_Time) lt Start_Time) or (Room_Number ne lag(Room_Number)) then group+1; run; proc transpose data=one out=temp(drop=_name_) ; by group; id client_id; var Est_Amount; run; data two(keep=group var); set temp; length var $ 200; array x{*} p: ; n=dim(x); do k=2 to n; ncomb=comb(n,k); do j=1 to ncomb; call allcomb(j, k, of x{*}); flag=0;sum=0;var=' '; do i=1 to k; if missing(x{i}) then flag=1; end; if not flag then do; do i=1 to k; sum+x{i}; end; if abs(sum) le 5000 then do; do i=1 to k; var=catx(',',var,x{i}); end; output; end; end; end; end; run; data two(keep=group var _var); merge two(in=a) temp; by group; length _var $ 200; array x{*} p: ; if a then do; do j=1 to countw(var,','); temp=input(scan(var,j,','),best32.); do i=1 to dim(x); if temp eq x{i} and not findw(_var,strip(vname(x{i}))) then _var=catx(',',_var,vname(x{i})); end; end; output; end; run;
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.