DATA Step, Macro, Functions and more

Flag random data based on criteria

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 84
Accepted Solution

Flag random data based on criteria

[ Edited ]

hello,

I need to transform data for the purpose of evaluating the actions that employees have taken on customers' account.

It would be greatly appreciated if someone could point me in the right direction.

 

I'm looking at previous month's data, which captures on a given day, any time an agent "actions" / modifies a customer's account. The action taken on customer's account are categorized by risk level, from 1 (most critical) to 3 less, and 0, no risk.

 

What I need to do:

 

Flag 5 observations per agent of which 3 have a risk_level=1, and 2 have a risk_level=2. If cannot find total of 5 obs, then select from risk_level=3, and if still can't find, select risk_level=0. (The goal is to flag 5 where 3 have a risk level of 1, and two  a lower risk level)

Ultimately, it would be great if I could export the data to excel, and have a pivot table for agent_name, and variable Subject_to_review, where each agent name would have a count 5.

Example data has been added below.

 

Thanks


Accepted Solutions
Solution
‎03-22-2017 02:33 PM
Super Contributor
Posts: 251

Re: Flag random data based on criteria

[ Edited ]

How does this work for you? It is extensible: throw as much data at it as you like, and it should still work.

 

data have;
infile cards dsd dlm=',' firstobs=2 missover;
attrib agent_name length=$ 20;
attrib action_type length=$ 4;
attrib risk_level subject_to_review length=$ 1;
input agent_name
      action_type
      risk_level;
select(risk_level);
   when('1', '2', '3') recode_risk_level = risk_level;
   when('0') recode_risk_level = '4';
   otherwise recode_risk_level = '5';
   end;
random = ranuni(225465114);  /* My 30-year-old UK phone number as seed */
id = monotonic();    /* Create a unique incremented ID per observation */
cards;
Agent_name,action_type,risk_level,subject_to_review
Albert,1000,1,Y
Albert,1000,1,Y
Albert,1000,1,
Albert,1000,1,
Albert,2100,2,Y
Albert,2100,2,Y
Albert,2100,2,Y
Albert,2100,2,
Albert,3230,0,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,5200,,
Barb,1000,1,Y
Barb,1000,1,Y
Barb,2100,2,Y
Barb,2100,2,Y
Barb,2100,2,Y
Barb,3230,0,
Cathy,1000,1,Y
Cathy,2100,2,Y
Cathy,2100,2,Y
Cathy,3230,0,Y
Cathy,5200,,Y
;
run;

proc sort data=have noequals;
by agent_name recode_risk_level random;
run;

data subject_to_review;
set have;
by agent_name recode_risk_level;
retain count output_count;
if first.agent_name then
   output_count = 1;
if first.recode_risk_level
   then count = 1;
   else count + 1;
if (   output_count > 5
    or recode_risk_level = '1' and count > 3
    or recode_risk_level = '2' and count > 2) then
    delete;
output_count + 1;
keep id;
run;

proc sql undo_policy=none magic=102;
create table want as 
   select have.agent_name,
          have.action_type,
          have.risk_level,
          ifc(str.id is not null, 'Y', ' ') as subject_to_review length=1
     from have
     left join subject_to_review as str
       on have.id = str.id
    order by have.id;
quit;

 

View solution in original post


All Replies
Super Contributor
Posts: 251

Re: Flag random data based on criteria

It's possible, but very difficult, to provide a suggestion without having some test data to work on. Can you give us some?

Frequent Contributor
Posts: 84

Re: Flag random data based on criteria

[ Edited ]

Thanks LaurieF for ur reply.  At it's simplest, I need to randomly select 5 observations per agent_name, that should follow this pattern:

 

- if can find 3 observations of risk_level=1, flag subject_to_review=Y

-if can 2 observations of risk_level=2, flag subject_to_review=Y

 

This would arrive at 5 observations per agent_name. it becomes more complicated if cannot find 3 observations, as then need to pick up a lower risk level to arrive at your five per agent (no hard rule, but should first fetch higher risk level.

From higher to lower is : 1,2,3,0,'')

 

See below example, but without it being randomly assigned Y.  If no one has a lead to suggest, I may go with attempting to achieve this with excel vba. Thank you

 

DATA_WANT
Agent_name action_type risk_level subject_to_review
Albert10001Y
Albert10001Y
Albert10001 
Albert10001 
Albert21002Y
Albert21002Y
Albert21002Y
Albert21002 
Albert32300 
Albert25003 
Albert25003 
Albert25003 
Albert25003 
Albert25003 
Albert5200  
Barb10001Y
Barb10001Y
Barb21002Y
Barb21002Y
Barb21002Y
Barb32300 
Cathy10001Y
Cathy21002Y
Cathy21002Y
Cathy32300Y
Cathy5200 Y
Solution
‎03-22-2017 02:33 PM
Super Contributor
Posts: 251

Re: Flag random data based on criteria

[ Edited ]

How does this work for you? It is extensible: throw as much data at it as you like, and it should still work.

 

data have;
infile cards dsd dlm=',' firstobs=2 missover;
attrib agent_name length=$ 20;
attrib action_type length=$ 4;
attrib risk_level subject_to_review length=$ 1;
input agent_name
      action_type
      risk_level;
select(risk_level);
   when('1', '2', '3') recode_risk_level = risk_level;
   when('0') recode_risk_level = '4';
   otherwise recode_risk_level = '5';
   end;
random = ranuni(225465114);  /* My 30-year-old UK phone number as seed */
id = monotonic();    /* Create a unique incremented ID per observation */
cards;
Agent_name,action_type,risk_level,subject_to_review
Albert,1000,1,Y
Albert,1000,1,Y
Albert,1000,1,
Albert,1000,1,
Albert,2100,2,Y
Albert,2100,2,Y
Albert,2100,2,Y
Albert,2100,2,
Albert,3230,0,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,5200,,
Barb,1000,1,Y
Barb,1000,1,Y
Barb,2100,2,Y
Barb,2100,2,Y
Barb,2100,2,Y
Barb,3230,0,
Cathy,1000,1,Y
Cathy,2100,2,Y
Cathy,2100,2,Y
Cathy,3230,0,Y
Cathy,5200,,Y
;
run;

proc sort data=have noequals;
by agent_name recode_risk_level random;
run;

data subject_to_review;
set have;
by agent_name recode_risk_level;
retain count output_count;
if first.agent_name then
   output_count = 1;
if first.recode_risk_level
   then count = 1;
   else count + 1;
if (   output_count > 5
    or recode_risk_level = '1' and count > 3
    or recode_risk_level = '2' and count > 2) then
    delete;
output_count + 1;
keep id;
run;

proc sql undo_policy=none magic=102;
create table want as 
   select have.agent_name,
          have.action_type,
          have.risk_level,
          ifc(str.id is not null, 'Y', ' ') as subject_to_review length=1
     from have
     left join subject_to_review as str
       on have.id = str.id
    order by have.id;
quit;

 

Frequent Contributor
Posts: 84

Re: Flag random data based on criteria

thank you very much Laurie for your suggestion and for some of the explanations provided.
Super Contributor
Posts: 251

Re: Flag random data based on criteria

If there's anything else you need to know (there's some interesting stuff in the sql join for example), sing out - I'm happy to explain.

Frequent Contributor
Posts: 84

Re: Flag random data based on criteria

hi Laurie,

 

Your suggestion is very useful, thanks again. My data will have numerous instances where a given agent will not have 3 observations with risk level 1 or even 2 obs with risk level 2. The challenge i believe is arriving at a program/code to assess that if higher risk (R) does not sum to 3 for a given agent, will need higher observations from lower risk level.

 

In the want table, row 19, under agent Barb, would be looking for it to populate to 'Y', as R has only 2 obs, so need 1 more, and prefer higher risk level.

 

agent_nameaction_typerisk_levelsubject_to_review
Albert10001 
Albert10001Y
Albert10001Y
Albert10001Y
Albert21002 
Albert21002Y
Albert21002Y
Albert21002 
Albert32300 
Albert25003 
Albert25003 
Albert25003 
Albert25003 
Albert25003 
Albert5200  
Barb10001Y
Barb10001Y
Barb21002 
Barb21002Y
Barb21002Y
Barb32300Y

 

Is there a way for the program you offered, to say, based on risk (R) level, for it to consider, for each given agent:

 

if R1*3>=3, do nothing; if R1*3=2 then fetch an additional lower level R2 ... Ultimately to arrive at the following 5 obs, per risk level:

 

R1+R1+R1+R2+R2 (Ideal scenario for a given agent)
R1+R1+R2+R2+R2
R1+R2+R2+R2+R2

If full shortage of R1, compensate by getting from lower level, etc.

R2+R2+R2+R2+R2
R2+R2+R2+R2+R3
R2+R2+R2+R3+R3
...
R5+R5+R5+R5+R5

 

Please, let me know your thoughts, 

Thank you very much

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 240 views
  • 3 likes
  • 2 in conversation