Help using Base SAS procedures

Conditional Count

Accepted Solution Solved
Reply
Regular Contributor
Posts: 186
Accepted Solution

Conditional Count

Hi,

I have a table like the following :

ItemName1Risk1Name2Risk2Name3Risk3
item1bobhighbobhighmargaretlow
item2bobhighjohnhighlewishigh
item3billlownickvery highsusansanction level 1

I'd like to add a column named "Risk Count".

The logical test to increment this count is the following :

First you have to check is the risk is within the following list ("high","very high","sanction level 1")

Second, if you find the risk, you have to check whether the name was already counted or not. Each name assigned to a risk level in the list above must only be counted once per item.

What I mean by that, is that in item1 bob is repeated in name1 and name2, but he should just be counted once

So the result for  each item would be :

ItemRisk Count
item11
item23
item32

Hope you can help me achieve this. Thank you for your help and time.


Accepted Solutions
Solution
‎04-01-2014 03:57 PM
Super User
Posts: 5,099

Re: Conditional Count

If you re-shape your data, this becomes a relatively easy problem.  Start with this:

data reshaped;

   set have;

   name=name1;

   risk=risk1;

   output;

   name=name2;

   risk=risk2;

   output;

   name=name3;

   risk=risk3;

   output;

   keep item name risk;

run;

The finishing steps:

proc sort data=reshaped;

   by item name;

run;

data want;

   set reshaped;

   by item name;

   if first.item then risk_count=0;

   if first.name then add_to_count=0;

   retain add_to_count;

   if risk in ('high', 'very high' , 'sanction level 1') then add_to_count=1;

   if last.name then risk_count + add_to_count;

   if last.item;

   keep item risk_count;

run;

View solution in original post


All Replies
Regular Contributor
Posts: 217

Re: Conditional Count

You did not state your solution style preference.  Here's my untested quick and simple data style method.

data want;

      data have;

    format Risk_Count 8;       /* put the new numeric column at the end of the new dataset */

    Risk_Count = 0;              /*  set the default value at the beginning of each record. assume a missing value is not wanted */

     if risk1 not in ('low') then risk_count = risk_count + 1;

     if risk2 not in ('low') then risk_count = risk_count + 1;

     if risk3 not in ('low') then risk_count = risk_count + 1;

if risk_count > 1 then do;             /* if there is zero or one risk count, there is no need to check for duplicate names */

     if name3 in (name2, name1) and risk3 not in ('low') then risk_count -1;   /* names can be the same but risk can be low */

     if risk_count > 1 and name2 in (name1) and risk2 not in ('low') then risk_count -1;  /* name3 could change risk_count */

end;   

        /* assuming all names and risks are spelled the same and cased the same */

run;

Regular Contributor
Posts: 186

Re: Conditional Count

Thank you very much for your reply.

For some reason I get an error on that line : if name3 in (name2, name1)

It is like if I couldn't use string variables in an  in list...

Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,

              a missing value, iterator, (.

Regular Contributor
Posts: 217

Re: Conditional Count

I was stretching myself.  The 'in' statement is supposed to be the same as an if then statement.

Change the 'in' code to

If ((name3 = name2) or (name3 = name1)) then.....

and you should get what you need.  Should. Smiley Happy

Regular Contributor
Posts: 217

Re: Conditional Count

This worked.


data start;
  length Item Name1 Risk1 Name2 Risk2 Name3 Risk3 $16 Risk_Count 8;
  format Risk_Count 8.;
  Risk_Count = 0;
  input (Item Name1 Risk1 Name2 Risk2 Name3 Risk3)($);
     if risk1 not in ('low') then risk_count = risk_count + 1;
     if risk2 not in ('low') then risk_count = risk_count + 1;
     if risk3 not in ('low') then risk_count = risk_count + 1;
     if risk_count > 1 then do;            
          if (name3 = name2) or (name3 = name1) and risk3 not in ('low') then risk_count = risk_count -1;  
          if risk_count > 1 and (name2 = name1) and risk2 not in ('low') then risk_count = risk_count -1; 
      end;   
  cards;
item1 bob high bob high margaret low
item2 bob high john high lewis high
item3 bill low nick very_high susan sanction_level_1
;
run;

proc print data=start;
run;

Regular Contributor
Posts: 217

Re: Conditional Count

You could possibly use SQL. In a case statement, using a nested SQL statement, count the distinct names that are not low, and place the count at the end of a select statement where unique row identifier in A is equal to the unique row identifier in the nested sql B. If A does not have a row in B, then A.row_count = 0.

Solution
‎04-01-2014 03:57 PM
Super User
Posts: 5,099

Re: Conditional Count

If you re-shape your data, this becomes a relatively easy problem.  Start with this:

data reshaped;

   set have;

   name=name1;

   risk=risk1;

   output;

   name=name2;

   risk=risk2;

   output;

   name=name3;

   risk=risk3;

   output;

   keep item name risk;

run;

The finishing steps:

proc sort data=reshaped;

   by item name;

run;

data want;

   set reshaped;

   by item name;

   if first.item then risk_count=0;

   if first.name then add_to_count=0;

   retain add_to_count;

   if risk in ('high', 'very high' , 'sanction level 1') then add_to_count=1;

   if last.name then risk_count + add_to_count;

   if last.item;

   keep item risk_count;

run;

Regular Contributor
Posts: 186

Re: Conditional Count

Thank you all for your help.

Both solutions worked fine.

Super User
Posts: 5,099

Re: Conditional Count

I'm not sure they would always give you the same results.  Try this line of data:

item1 bob low margaret high bob high


Regular Contributor
Posts: 186

Re: Conditional Count

Thank you for pointing this out.

In any case, in my real data set a name can only be assigned to one risk.

Best regards.

Super User
Posts: 17,962

Re: Conditional Count

That's contradictory to your original question....

Super User
Posts: 5,099

Re: Conditional Count

Ahh, in that case I can simplify my finishing steps:

proc sort data=reshaped nodupkey;

   by item name;

   where risk in ('high', 'very high', 'sanction level 1');

run;

data want;

   set reshaped;

   by item;

   if first.item then risk_count=1;

   else risk_count + 1;

   if last.item;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 410 views
  • 6 likes
  • 4 in conversation