BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nicnad
Fluorite | Level 6

Hi,

I have a table like the following :

ItemName1Risk1Name2Risk2Name3Risk3
item1bobhighbobhighmargaretlow
item2bobhighjohnhighlewishigh
item3billlownickvery highsusansanction level 1

I'd like to add a column named "Risk Count".

The logical test to increment this count is the following :

First you have to check is the risk is within the following list ("high","very high","sanction level 1")

Second, if you find the risk, you have to check whether the name was already counted or not. Each name assigned to a risk level in the list above must only be counted once per item.

What I mean by that, is that in item1 bob is repeated in name1 and name2, but he should just be counted once

So the result for  each item would be :

ItemRisk Count
item11
item23
item32

Hope you can help me achieve this. Thank you for your help and time.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

If you re-shape your data, this becomes a relatively easy problem.  Start with this:

data reshaped;

   set have;

   name=name1;

   risk=risk1;

   output;

   name=name2;

   risk=risk2;

   output;

   name=name3;

   risk=risk3;

   output;

   keep item name risk;

run;

The finishing steps:

proc sort data=reshaped;

   by item name;

run;

data want;

   set reshaped;

   by item name;

   if first.item then risk_count=0;

   if first.name then add_to_count=0;

   retain add_to_count;

   if risk in ('high', 'very high' , 'sanction level 1') then add_to_count=1;

   if last.name then risk_count + add_to_count;

   if last.item;

   keep item risk_count;

run;

View solution in original post

11 REPLIES 11
jwillis
Quartz | Level 8

You did not state your solution style preference.  Here's my untested quick and simple data style method.

data want;

      data have;

    format Risk_Count 8;       /* put the new numeric column at the end of the new dataset */

    Risk_Count = 0;              /*  set the default value at the beginning of each record. assume a missing value is not wanted */

     if risk1 not in ('low') then risk_count = risk_count + 1;

     if risk2 not in ('low') then risk_count = risk_count + 1;

     if risk3 not in ('low') then risk_count = risk_count + 1;

if risk_count > 1 then do;             /* if there is zero or one risk count, there is no need to check for duplicate names */

     if name3 in (name2, name1) and risk3 not in ('low') then risk_count -1;   /* names can be the same but risk can be low */

     if risk_count > 1 and name2 in (name1) and risk2 not in ('low') then risk_count -1;  /* name3 could change risk_count */

end;   

        /* assuming all names and risks are spelled the same and cased the same */

run;

nicnad
Fluorite | Level 6

Thank you very much for your reply.

For some reason I get an error on that line : if name3 in (name2, name1)

It is like if I couldn't use string variables in an  in list...

Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,

              a missing value, iterator, (.

jwillis
Quartz | Level 8

I was stretching myself.  The 'in' statement is supposed to be the same as an if then statement.

Change the 'in' code to

If ((name3 = name2) or (name3 = name1)) then.....

and you should get what you need.  Should. Smiley Happy

jwillis
Quartz | Level 8

This worked.


data start;
  length Item Name1 Risk1 Name2 Risk2 Name3 Risk3 $16 Risk_Count 8;
  format Risk_Count 8.;
  Risk_Count = 0;
  input (Item Name1 Risk1 Name2 Risk2 Name3 Risk3)($);
     if risk1 not in ('low') then risk_count = risk_count + 1;
     if risk2 not in ('low') then risk_count = risk_count + 1;
     if risk3 not in ('low') then risk_count = risk_count + 1;
     if risk_count > 1 then do;            
          if (name3 = name2) or (name3 = name1) and risk3 not in ('low') then risk_count = risk_count -1;  
          if risk_count > 1 and (name2 = name1) and risk2 not in ('low') then risk_count = risk_count -1; 
      end;   
  cards;
item1 bob high bob high margaret low
item2 bob high john high lewis high
item3 bill low nick very_high susan sanction_level_1
;
run;

proc print data=start;
run;

jwillis
Quartz | Level 8

You could possibly use SQL. In a case statement, using a nested SQL statement, count the distinct names that are not low, and place the count at the end of a select statement where unique row identifier in A is equal to the unique row identifier in the nested sql B. If A does not have a row in B, then A.row_count = 0.

Astounding
PROC Star

If you re-shape your data, this becomes a relatively easy problem.  Start with this:

data reshaped;

   set have;

   name=name1;

   risk=risk1;

   output;

   name=name2;

   risk=risk2;

   output;

   name=name3;

   risk=risk3;

   output;

   keep item name risk;

run;

The finishing steps:

proc sort data=reshaped;

   by item name;

run;

data want;

   set reshaped;

   by item name;

   if first.item then risk_count=0;

   if first.name then add_to_count=0;

   retain add_to_count;

   if risk in ('high', 'very high' , 'sanction level 1') then add_to_count=1;

   if last.name then risk_count + add_to_count;

   if last.item;

   keep item risk_count;

run;

nicnad
Fluorite | Level 6

Thank you all for your help.

Both solutions worked fine.

Astounding
PROC Star

I'm not sure they would always give you the same results.  Try this line of data:

item1 bob low margaret high bob high


nicnad
Fluorite | Level 6

Thank you for pointing this out.

In any case, in my real data set a name can only be assigned to one risk.

Best regards.

Reeza
Super User

That's contradictory to your original question....

Astounding
PROC Star

Ahh, in that case I can simplify my finishing steps:

proc sort data=reshaped nodupkey;

   by item name;

   where risk in ('high', 'very high', 'sanction level 1');

run;

data want;

   set reshaped;

   by item;

   if first.item then risk_count=1;

   else risk_count + 1;

   if last.item;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1757 views
  • 6 likes
  • 4 in conversation