Hi,
I have a table like the following :
Item | Name1 | Risk1 | Name2 | Risk2 | Name3 | Risk3 |
---|---|---|---|---|---|---|
item1 | bob | high | bob | high | margaret | low |
item2 | bob | high | john | high | lewis | high |
item3 | bill | low | nick | very high | susan | sanction level 1 |
I'd like to add a column named "Risk Count".
The logical test to increment this count is the following :
First you have to check is the risk is within the following list ("high","very high","sanction level 1")
Second, if you find the risk, you have to check whether the name was already counted or not. Each name assigned to a risk level in the list above must only be counted once per item.
What I mean by that, is that in item1 bob is repeated in name1 and name2, but he should just be counted once
So the result for each item would be :
Item | Risk Count |
---|---|
item1 | 1 |
item2 | 3 |
item3 | 2 |
Hope you can help me achieve this. Thank you for your help and time.
If you re-shape your data, this becomes a relatively easy problem. Start with this:
data reshaped;
set have;
name=name1;
risk=risk1;
output;
name=name2;
risk=risk2;
output;
name=name3;
risk=risk3;
output;
keep item name risk;
run;
The finishing steps:
proc sort data=reshaped;
by item name;
run;
data want;
set reshaped;
by item name;
if first.item then risk_count=0;
if first.name then add_to_count=0;
retain add_to_count;
if risk in ('high', 'very high' , 'sanction level 1') then add_to_count=1;
if last.name then risk_count + add_to_count;
if last.item;
keep item risk_count;
run;
You did not state your solution style preference. Here's my untested quick and simple data style method.
data want;
data have;
format Risk_Count 8; /* put the new numeric column at the end of the new dataset */
Risk_Count = 0; /* set the default value at the beginning of each record. assume a missing value is not wanted */
if risk1 not in ('low') then risk_count = risk_count + 1;
if risk2 not in ('low') then risk_count = risk_count + 1;
if risk3 not in ('low') then risk_count = risk_count + 1;
if risk_count > 1 then do; /* if there is zero or one risk count, there is no need to check for duplicate names */
if name3 in (name2, name1) and risk3 not in ('low') then risk_count -1; /* names can be the same but risk can be low */
if risk_count > 1 and name2 in (name1) and risk2 not in ('low') then risk_count -1; /* name3 could change risk_count */
end;
/* assuming all names and risks are spelled the same and cased the same */
run;
Thank you very much for your reply.
For some reason I get an error on that line : if name3 in (name2, name1)
It is like if I couldn't use string variables in an in list...
Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,
a missing value, iterator, (.
I was stretching myself. The 'in' statement is supposed to be the same as an if then statement.
Change the 'in' code to
If ((name3 = name2) or (name3 = name1)) then.....
and you should get what you need. Should.
This worked.
data start;
length Item Name1 Risk1 Name2 Risk2 Name3 Risk3 $16 Risk_Count 8;
format Risk_Count 8.;
Risk_Count = 0;
input (Item Name1 Risk1 Name2 Risk2 Name3 Risk3)($);
if risk1 not in ('low') then risk_count = risk_count + 1;
if risk2 not in ('low') then risk_count = risk_count + 1;
if risk3 not in ('low') then risk_count = risk_count + 1;
if risk_count > 1 then do;
if (name3 = name2) or (name3 = name1) and risk3 not in ('low') then risk_count = risk_count -1;
if risk_count > 1 and (name2 = name1) and risk2 not in ('low') then risk_count = risk_count -1;
end;
cards;
item1 bob high bob high margaret low
item2 bob high john high lewis high
item3 bill low nick very_high susan sanction_level_1
;
run;
proc print data=start;
run;
You could possibly use SQL. In a case statement, using a nested SQL statement, count the distinct names that are not low, and place the count at the end of a select statement where unique row identifier in A is equal to the unique row identifier in the nested sql B. If A does not have a row in B, then A.row_count = 0.
If you re-shape your data, this becomes a relatively easy problem. Start with this:
data reshaped;
set have;
name=name1;
risk=risk1;
output;
name=name2;
risk=risk2;
output;
name=name3;
risk=risk3;
output;
keep item name risk;
run;
The finishing steps:
proc sort data=reshaped;
by item name;
run;
data want;
set reshaped;
by item name;
if first.item then risk_count=0;
if first.name then add_to_count=0;
retain add_to_count;
if risk in ('high', 'very high' , 'sanction level 1') then add_to_count=1;
if last.name then risk_count + add_to_count;
if last.item;
keep item risk_count;
run;
Thank you all for your help.
Both solutions worked fine.
I'm not sure they would always give you the same results. Try this line of data:
item1 bob low margaret high bob high
Thank you for pointing this out.
In any case, in my real data set a name can only be assigned to one risk.
Best regards.
That's contradictory to your original question....
Ahh, in that case I can simplify my finishing steps:
proc sort data=reshaped nodupkey;
by item name;
where risk in ('high', 'very high', 'sanction level 1');
run;
data want;
set reshaped;
by item;
if first.item then risk_count=1;
else risk_count + 1;
if last.item;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.