hello,
I need to transform data for the purpose of evaluating the actions that employees have taken on customers' account.
It would be greatly appreciated if someone could point me in the right direction.
I'm looking at previous month's data, which captures on a given day, any time an agent "actions" / modifies a customer's account. The action taken on customer's account are categorized by risk level, from 1 (most critical) to 3 less, and 0, no risk.
What I need to do:
Flag 5 observations per agent of which 3 have a risk_level=1, and 2 have a risk_level=2. If cannot find total of 5 obs, then select from risk_level=3, and if still can't find, select risk_level=0. (The goal is to flag 5 where 3 have a risk level of 1, and two a lower risk level)
Ultimately, it would be great if I could export the data to excel, and have a pivot table for agent_name, and variable Subject_to_review, where each agent name would have a count 5.
Example data has been added below.
Thanks
How does this work for you? It is extensible: throw as much data at it as you like, and it should still work.
data have;
infile cards dsd dlm=',' firstobs=2 missover;
attrib agent_name length=$ 20;
attrib action_type length=$ 4;
attrib risk_level subject_to_review length=$ 1;
input agent_name
action_type
risk_level;
select(risk_level);
when('1', '2', '3') recode_risk_level = risk_level;
when('0') recode_risk_level = '4';
otherwise recode_risk_level = '5';
end;
random = ranuni(225465114); /* My 30-year-old UK phone number as seed */
id = monotonic(); /* Create a unique incremented ID per observation */
cards;
Agent_name,action_type,risk_level,subject_to_review
Albert,1000,1,Y
Albert,1000,1,Y
Albert,1000,1,
Albert,1000,1,
Albert,2100,2,Y
Albert,2100,2,Y
Albert,2100,2,Y
Albert,2100,2,
Albert,3230,0,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,5200,,
Barb,1000,1,Y
Barb,1000,1,Y
Barb,2100,2,Y
Barb,2100,2,Y
Barb,2100,2,Y
Barb,3230,0,
Cathy,1000,1,Y
Cathy,2100,2,Y
Cathy,2100,2,Y
Cathy,3230,0,Y
Cathy,5200,,Y
;
run;
proc sort data=have noequals;
by agent_name recode_risk_level random;
run;
data subject_to_review;
set have;
by agent_name recode_risk_level;
retain count output_count;
if first.agent_name then
output_count = 1;
if first.recode_risk_level
then count = 1;
else count + 1;
if ( output_count > 5
or recode_risk_level = '1' and count > 3
or recode_risk_level = '2' and count > 2) then
delete;
output_count + 1;
keep id;
run;
proc sql undo_policy=none magic=102;
create table want as
select have.agent_name,
have.action_type,
have.risk_level,
ifc(str.id is not null, 'Y', ' ') as subject_to_review length=1
from have
left join subject_to_review as str
on have.id = str.id
order by have.id;
quit;
It's possible, but very difficult, to provide a suggestion without having some test data to work on. Can you give us some?
Thanks LaurieF for ur reply. At it's simplest, I need to randomly select 5 observations per agent_name, that should follow this pattern:
- if can find 3 observations of risk_level=1, flag subject_to_review=Y
-if can 2 observations of risk_level=2, flag subject_to_review=Y
This would arrive at 5 observations per agent_name. it becomes more complicated if cannot find 3 observations, as then need to pick up a lower risk level to arrive at your five per agent (no hard rule, but should first fetch higher risk level.
From higher to lower is : 1,2,3,0,'')
See below example, but without it being randomly assigned Y. If no one has a lead to suggest, I may go with attempting to achieve this with excel vba. Thank you
DATA_WANT | |||
Agent_name | action_type | risk_level | subject_to_review |
Albert | 1000 | 1 | Y |
Albert | 1000 | 1 | Y |
Albert | 1000 | 1 | |
Albert | 1000 | 1 | |
Albert | 2100 | 2 | Y |
Albert | 2100 | 2 | Y |
Albert | 2100 | 2 | Y |
Albert | 2100 | 2 | |
Albert | 3230 | 0 | |
Albert | 2500 | 3 | |
Albert | 2500 | 3 | |
Albert | 2500 | 3 | |
Albert | 2500 | 3 | |
Albert | 2500 | 3 | |
Albert | 5200 | ||
Barb | 1000 | 1 | Y |
Barb | 1000 | 1 | Y |
Barb | 2100 | 2 | Y |
Barb | 2100 | 2 | Y |
Barb | 2100 | 2 | Y |
Barb | 3230 | 0 | |
Cathy | 1000 | 1 | Y |
Cathy | 2100 | 2 | Y |
Cathy | 2100 | 2 | Y |
Cathy | 3230 | 0 | Y |
Cathy | 5200 | Y |
How does this work for you? It is extensible: throw as much data at it as you like, and it should still work.
data have;
infile cards dsd dlm=',' firstobs=2 missover;
attrib agent_name length=$ 20;
attrib action_type length=$ 4;
attrib risk_level subject_to_review length=$ 1;
input agent_name
action_type
risk_level;
select(risk_level);
when('1', '2', '3') recode_risk_level = risk_level;
when('0') recode_risk_level = '4';
otherwise recode_risk_level = '5';
end;
random = ranuni(225465114); /* My 30-year-old UK phone number as seed */
id = monotonic(); /* Create a unique incremented ID per observation */
cards;
Agent_name,action_type,risk_level,subject_to_review
Albert,1000,1,Y
Albert,1000,1,Y
Albert,1000,1,
Albert,1000,1,
Albert,2100,2,Y
Albert,2100,2,Y
Albert,2100,2,Y
Albert,2100,2,
Albert,3230,0,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,2500,3,
Albert,5200,,
Barb,1000,1,Y
Barb,1000,1,Y
Barb,2100,2,Y
Barb,2100,2,Y
Barb,2100,2,Y
Barb,3230,0,
Cathy,1000,1,Y
Cathy,2100,2,Y
Cathy,2100,2,Y
Cathy,3230,0,Y
Cathy,5200,,Y
;
run;
proc sort data=have noequals;
by agent_name recode_risk_level random;
run;
data subject_to_review;
set have;
by agent_name recode_risk_level;
retain count output_count;
if first.agent_name then
output_count = 1;
if first.recode_risk_level
then count = 1;
else count + 1;
if ( output_count > 5
or recode_risk_level = '1' and count > 3
or recode_risk_level = '2' and count > 2) then
delete;
output_count + 1;
keep id;
run;
proc sql undo_policy=none magic=102;
create table want as
select have.agent_name,
have.action_type,
have.risk_level,
ifc(str.id is not null, 'Y', ' ') as subject_to_review length=1
from have
left join subject_to_review as str
on have.id = str.id
order by have.id;
quit;
If there's anything else you need to know (there's some interesting stuff in the sql join for example), sing out - I'm happy to explain.
hi Laurie,
Your suggestion is very useful, thanks again. My data will have numerous instances where a given agent will not have 3 observations with risk level 1 or even 2 obs with risk level 2. The challenge i believe is arriving at a program/code to assess that if higher risk (R) does not sum to 3 for a given agent, will need higher observations from lower risk level.
In the want table, row 19, under agent Barb, would be looking for it to populate to 'Y', as R has only 2 obs, so need 1 more, and prefer higher risk level.
agent_name | action_type | risk_level | subject_to_review |
Albert | 1000 | 1 | |
Albert | 1000 | 1 | Y |
Albert | 1000 | 1 | Y |
Albert | 1000 | 1 | Y |
Albert | 2100 | 2 | |
Albert | 2100 | 2 | Y |
Albert | 2100 | 2 | Y |
Albert | 2100 | 2 | |
Albert | 3230 | 0 | |
Albert | 2500 | 3 | |
Albert | 2500 | 3 | |
Albert | 2500 | 3 | |
Albert | 2500 | 3 | |
Albert | 2500 | 3 | |
Albert | 5200 | ||
Barb | 1000 | 1 | Y |
Barb | 1000 | 1 | Y |
Barb | 2100 | 2 | |
Barb | 2100 | 2 | Y |
Barb | 2100 | 2 | Y |
Barb | 3230 | 0 | Y |
Is there a way for the program you offered, to say, based on risk (R) level, for it to consider, for each given agent:
if R1*3>=3, do nothing; if R1*3=2 then fetch an additional lower level R2 ... Ultimately to arrive at the following 5 obs, per risk level:
R1+R1+R1+R2+R2 (Ideal scenario for a given agent)
R1+R1+R2+R2+R2
R1+R2+R2+R2+R2
If full shortage of R1, compensate by getting from lower level, etc.
R2+R2+R2+R2+R2
R2+R2+R2+R2+R3
R2+R2+R2+R3+R3
...
R5+R5+R5+R5+R5
Please, let me know your thoughts,
Thank you very much
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.