Solved: Re: Conditional Count

nicnad · Posted 04-01-2014 02:16 PM

Hi,

I have a table like the following :

Item	Name1	Risk1	Name2	Risk2	Name3	Risk3
item1	bob	high	bob	high	margaret	low
item2	bob	high	john	high	lewis	high
item3	bill	low	nick	very high	susan	sanction level 1

I'd like to add a column named "Risk Count".

The logical test to increment this count is the following :

First you have to check is the risk is within the following list ("high","very high","sanction level 1")

Second, if you find the risk, you have to check whether the name was already counted or not. Each name assigned to a risk level in the list above must only be counted once per item.

What I mean by that, is that in item1 bob is repeated in name1 and name2, but he should just be counted once

So the result for each item would be :

Item	Risk Count
item1	1
item2	3
item3	2

Hope you can help me achieve this. Thank you for your help and time.

Astounding · Posted 04-01-2014 03:57 PM

If you re-shape your data, this becomes a relatively easy problem. Start with this:

data reshaped;

set have;

name=name1;

risk=risk1;

output;

name=name2;

risk=risk2;

output;

name=name3;

risk=risk3;

output;

keep item name risk;

run;

The finishing steps:

proc sort data=reshaped;

by item name;

run;

data want;

set reshaped;

by item name;

if first.item then risk_count=0;

if first.name then add_to_count=0;

retain add_to_count;

if risk in ('high', 'very high' , 'sanction level 1') then add_to_count=1;

if last.name then risk_count + add_to_count;

if last.item;

keep item risk_count;

run;

View solution in original post

jwillis · Posted 04-01-2014 02:53 PM

You did not state your solution style preference. Here's my untested quick and simple data style method.

data want;

data have;

format Risk_Count 8; /* put the new numeric column at the end of the new dataset */

Risk_Count = 0; /* set the default value at the beginning of each record. assume a missing value is not wanted */

if risk1 not in ('low') then risk_count = risk_count + 1;

if risk2 not in ('low') then risk_count = risk_count + 1;

if risk3 not in ('low') then risk_count = risk_count + 1;

if risk_count > 1 then do; /* if there is zero or one risk count, there is no need to check for duplicate names */

if name3 in (name2, name1) and risk3 not in ('low') then risk_count -1; /* names can be the same but risk can be low */

if risk_count > 1 and name2 in (name1) and risk2 not in ('low') then risk_count -1; /* name3 could change risk_count */

end;

/* assuming all names and risks are spelled the same and cased the same */

run;

nicnad · Posted 04-01-2014 03:21 PM

Thank you very much for your reply.

For some reason I get an error on that line : if name3 in (name2, name1)

It is like if I couldn't use string variables in an in list...

Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,

a missing value, iterator, (.

jwillis · Posted 04-01-2014 03:34 PM

I was stretching myself. The 'in' statement is supposed to be the same as an if then statement.

Change the 'in' code to

If ((name3 = name2) or (name3 = name1)) then.....

and you should get what you need. Should.

jwillis · Posted 04-01-2014 04:02 PM

This worked.

data start;
length Item Name1 Risk1 Name2 Risk2 Name3 Risk3 $16 Risk_Count 8;
format Risk_Count 8.;
Risk_Count = 0;
input (Item Name1 Risk1 Name2 Risk2 Name3 Risk3)($);
     if risk1 not in ('low') then risk_count = risk_count + 1;
     if risk2 not in ('low') then risk_count = risk_count + 1;
     if risk3 not in ('low') then risk_count = risk_count + 1;
     if risk_count > 1 then do;
          if (name3 = name2) or (name3 = name1) and risk3 not in ('low') then risk_count = risk_count -1;
          if risk_count > 1 and (name2 = name1) and risk2 not in ('low') then risk_count = risk_count -1;
      end;
cards;
item1 bob high bob high margaret low
item2 bob high john high lewis high
item3 bill low nick very_high susan sanction_level_1
;
run;

proc print data=start;
run;

jwillis · Posted 04-01-2014 03:03 PM

You could possibly use SQL. In a case statement, using a nested SQL statement, count the distinct names that are not low, and place the count at the end of a select statement where unique row identifier in A is equal to the unique row identifier in the nested sql B. If A does not have a row in B, then A.row_count = 0.

Astounding · Posted 04-01-2014 03:57 PM

If you re-shape your data, this becomes a relatively easy problem. Start with this:

data reshaped;

set have;

name=name1;

risk=risk1;

output;

name=name2;

risk=risk2;

output;

name=name3;

risk=risk3;

output;

keep item name risk;

run;

The finishing steps:

proc sort data=reshaped;

by item name;

run;

data want;

set reshaped;

by item name;

if first.item then risk_count=0;

if first.name then add_to_count=0;

retain add_to_count;

if risk in ('high', 'very high' , 'sanction level 1') then add_to_count=1;

if last.name then risk_count + add_to_count;

if last.item;

keep item risk_count;

run;

nicnad · Posted 04-01-2014 04:22 PM

Thank you all for your help.

Both solutions worked fine.

Astounding · Posted 04-01-2014 04:27 PM

I'm not sure they would always give you the same results. Try this line of data:

item1 bob low margaret high bob high

nicnad · Posted 04-01-2014 04:30 PM

Thank you for pointing this out.

In any case, in my real data set a name can only be assigned to one risk.

Best regards.

Reeza · Posted 04-01-2014 04:39 PM

That's contradictory to your original question....

Astounding · Posted 04-01-2014 04:56 PM

Ahh, in that case I can simplify my finishing steps:

proc sort data=reshaped nodupkey;

by item name;

where risk in ('high', 'very high', 'sanction level 1');

run;

data want;

set reshaped;

by item;

if first.item then risk_count=1;

else risk_count + 1;

if last.item;

run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!