Solved: Re: Flag random data based on criteria

brulard · Posted 03-20-2017 01:34 PM

hello,

I need to transform data for the purpose of evaluating the actions that employees have taken on customers' account.

It would be greatly appreciated if someone could point me in the right direction.

I'm looking at previous month's data, which captures on a given day, any time an agent "actions" / modifies a customer's account. The action taken on customer's account are categorized by risk level, from 1 (most critical) to 3 less, and 0, no risk.

What I need to do:

Flag 5 observations per agent of which 3 have a risk_level=1, and 2 have a risk_level=2. If cannot find total of 5 obs, then select from risk_level=3, and if still can't find, select risk_level=0. (The goal is to flag 5 where 3 have a risk level of 1, and two a lower risk level)

Ultimately, it would be great if I could export the data to excel, and have a pivot table for agent_name, and variable Subject_to_review, where each agent name would have a count 5.

Example data has been added below.

Thanks

LaurieF · Posted 03-21-2017 07:06 PM

How does this work for you? It is extensible: throw as much data at it as you like, and it should still work.

data have;
infile cards dsd dlm=',' firstobs=2 missover;
attrib agent_name length=$ 20;
attrib action_type length=$ 4;
attrib risk_level subject_to_review length=$ 1;
input agent_name
      action_type
      risk_level;
select(risk_level);
   when('1', '2', '3') recode_risk_level = risk_level;
   when('0') recode_risk_level = '4';
   otherwise recode_risk_level = '5';
   end;
random = ranuni(225465114);  /* My 30-year-old UK phone number as seed */
id = monotonic();    /* Create a unique incremented ID per observation */
cards;
Agent_name,action_type,risk_level,subject_to_review
Albert,1000,1,Y
Albert,1000,1,Y
Albert,1000,1,
Albert,1000,1,
Albert,2100,2,Y
Albert,2100,2,Y
Albert,2100,2,Y
Albert,2100,2,
Albert,3230,0,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,5200,,
Barb,1000,1,Y
Barb,1000,1,Y
Barb,2100,2,Y
Barb,2100,2,Y
Barb,2100,2,Y
Barb,3230,0,
Cathy,1000,1,Y
Cathy,2100,2,Y
Cathy,2100,2,Y
Cathy,3230,0,Y
Cathy,5200,,Y
;
run;

proc sort data=have noequals;
by agent_name recode_risk_level random;
run;

data subject_to_review;
set have;
by agent_name recode_risk_level;
retain count output_count;
if first.agent_name then
   output_count = 1;
if first.recode_risk_level
   then count = 1;
   else count + 1;
if (   output_count > 5
    or recode_risk_level = '1' and count > 3
    or recode_risk_level = '2' and count > 2) then
    delete;
output_count + 1;
keep id;
run;

proc sql undo_policy=none magic=102;
create table want as 
   select have.agent_name,
          have.action_type,
          have.risk_level,
          ifc(str.id is not null, 'Y', ' ') as subject_to_review length=1
     from have
     left join subject_to_review as str
       on have.id = str.id
    order by have.id;
quit;

View solution in original post

LaurieF · Posted 03-20-2017 04:17 PM

It's possible, but very difficult, to provide a suggestion without having some test data to work on. Can you give us some?

brulard · Posted 03-21-2017 07:33 AM

Thanks LaurieF for ur reply. At it's simplest, I need to randomly select 5 observations per agent_name, that should follow this pattern:

- if can find 3 observations of risk_level=1, flag subject_to_review=Y

-if can 2 observations of risk_level=2, flag subject_to_review=Y

This would arrive at 5 observations per agent_name. it becomes more complicated if cannot find 3 observations, as then need to pick up a lower risk level to arrive at your five per agent (no hard rule, but should first fetch higher risk level.

From higher to lower is : 1,2,3,0,'')

See below example, but without it being randomly assigned Y. If no one has a lead to suggest, I may go with attempting to achieve this with excel vba. Thank you

DATA_WANT
Agent_name	action_type	risk_level	subject_to_review
Albert	1000	1	Y
Albert	1000	1	Y
Albert	1000	1
Albert	1000	1
Albert	2100	2	Y
Albert	2100	2	Y
Albert	2100	2	Y
Albert	2100	2
Albert	3230	0
Albert	2500	3
Albert	2500	3
Albert	2500	3
Albert	2500	3
Albert	2500	3
Albert	5200
Barb	1000	1	Y
Barb	1000	1	Y
Barb	2100	2	Y
Barb	2100	2	Y
Barb	2100	2	Y
Barb	3230	0
Cathy	1000	1	Y
Cathy	2100	2	Y
Cathy	2100	2	Y
Cathy	3230	0	Y
Cathy	5200		Y

LaurieF · Posted 03-21-2017 07:06 PM

How does this work for you? It is extensible: throw as much data at it as you like, and it should still work.

data have;
infile cards dsd dlm=',' firstobs=2 missover;
attrib agent_name length=$ 20;
attrib action_type length=$ 4;
attrib risk_level subject_to_review length=$ 1;
input agent_name
      action_type
      risk_level;
select(risk_level);
   when('1', '2', '3') recode_risk_level = risk_level;
   when('0') recode_risk_level = '4';
   otherwise recode_risk_level = '5';
   end;
random = ranuni(225465114);  /* My 30-year-old UK phone number as seed */
id = monotonic();    /* Create a unique incremented ID per observation */
cards;
Agent_name,action_type,risk_level,subject_to_review
Albert,1000,1,Y
Albert,1000,1,Y
Albert,1000,1,
Albert,1000,1,
Albert,2100,2,Y
Albert,2100,2,Y
Albert,2100,2,Y
Albert,2100,2,
Albert,3230,0,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,5200,,
Barb,1000,1,Y
Barb,1000,1,Y
Barb,2100,2,Y
Barb,2100,2,Y
Barb,2100,2,Y
Barb,3230,0,
Cathy,1000,1,Y
Cathy,2100,2,Y
Cathy,2100,2,Y
Cathy,3230,0,Y
Cathy,5200,,Y
;
run;

proc sort data=have noequals;
by agent_name recode_risk_level random;
run;

data subject_to_review;
set have;
by agent_name recode_risk_level;
retain count output_count;
if first.agent_name then
   output_count = 1;
if first.recode_risk_level
   then count = 1;
   else count + 1;
if (   output_count > 5
    or recode_risk_level = '1' and count > 3
    or recode_risk_level = '2' and count > 2) then
    delete;
output_count + 1;
keep id;
run;

proc sql undo_policy=none magic=102;
create table want as 
   select have.agent_name,
          have.action_type,
          have.risk_level,
          ifc(str.id is not null, 'Y', ' ') as subject_to_review length=1
     from have
     left join subject_to_review as str
       on have.id = str.id
    order by have.id;
quit;

brulard · Posted 03-22-2017 02:36 PM

thank you very much Laurie for your suggestion and for some of the explanations provided.

LaurieF · Posted 03-22-2017 10:10 PM

If there's anything else you need to know (there's some interesting stuff in the sql join for example), sing out - I'm happy to explain.

brulard · Posted 03-23-2017 02:05 PM

hi Laurie,

Your suggestion is very useful, thanks again. My data will have numerous instances where a given agent will not have 3 observations with risk level 1 or even 2 obs with risk level 2. The challenge i believe is arriving at a program/code to assess that if higher risk (R) does not sum to 3 for a given agent, will need higher observations from lower risk level.

In the want table, row 19, under agent Barb, would be looking for it to populate to 'Y', as R has only 2 obs, so need 1 more, and prefer higher risk level.

agent_name	action_type	risk_level	subject_to_review
Albert	1000	1
Albert	1000	1	Y
Albert	1000	1	Y
Albert	1000	1	Y
Albert	2100	2
Albert	2100	2	Y
Albert	2100	2	Y
Albert	2100	2
Albert	3230	0
Albert	2500	3
Albert	2500	3
Albert	2500	3
Albert	2500	3
Albert	2500	3
Albert	5200
Barb	1000	1	Y
Barb	1000	1	Y
Barb	2100	2
Barb	2100	2	Y
Barb	2100	2	Y
Barb	3230	0	Y

Is there a way for the program you offered, to say, based on risk (R) level, for it to consider, for each given agent:

if R1*3>=3, do nothing; if R1*3=2 then fetch an additional lower level R2 ... Ultimately to arrive at the following 5 obs, per risk level:

R1+R1+R1+R2+R2 (Ideal scenario for a given agent)
R1+R1+R2+R2+R2
R1+R2+R2+R2+R2

If full shortage of R1, compensate by getting from lower level, etc.

R2+R2+R2+R2+R2
R2+R2+R2+R2+R3
R2+R2+R2+R3+R3
...
R5+R5+R5+R5+R5

Please, let me know your thoughts,

Thank you very much

Registration is open

SAS Training: Just a Click Away