Help using Base SAS procedures

Looking for exceptions

Reply
Occasional Contributor
Posts: 12

Looking for exceptions

I am struggling with what I believe is a fairly simple logic string. 

My data looks like this:

Incident IDYearDiscscoreEmp No.
120052123
220065123
320070.5123
420081123
520091.5123
620012.5456
720023.5456
820035456
920094789
1020105789
11200651011


I am looking for employees (EMP NO.)  where at some point they had a DISCSCORE of 5 but then in subsequent years, they received any other score.  So, drawing the example from my dataset above, the syntax would return Emp No. 123 because they had a discscore of 5 in 2006 and then received a 0.5 in 2007.  The syntax would NOT return Emp No.'s 456, 789 or 1011 because the last Discscore they received were 5s.  

Thanks in advance.     

Respected Advisor
Posts: 3,156

Looking for exceptions

Posted in reply to Noelle125

What do you plan to do the cases with 5 then 0.5 then get back to 5 again?

Occasional Contributor
Posts: 12

Looking for exceptions

Good question.  That situation does exist in the dataset.  Ideally, those would be flagged as well but I believe those are truly rare and so if they, too, were returned, I would deal with them manually.  Obviously, it is easy to find all the 5s.  I'm just trying to not have to manually look through thousands of records to find those cases where an employee received a 5 and then received some other score later on.  

PROC Star
Posts: 7,492

Looking for exceptions

Posted in reply to Noelle125

Here is a way to do it with a datastep:

proc sort data=have;

  by Emp_No Year;

run;

data want (keep=Emp_No);

  set have;

  by Emp_No;

  retain have5 want;

  if first.Emp_No then do;

    have5=0;

    want=0;

  end;

  if have5 and Discscore ne 5 then want=1;

  if Discscore eq 5 then have5=1;

  if last.Emp_no and have5 and want

    then output;

run;

Occasional Contributor
Posts: 12

Looking for exceptions

Thank you!  You have given me another thing to try.  I really appreciate this.  For whatever reason, I was really stumped by this seemingly simple issue.

Respected Advisor
Posts: 3,156

Looking for exceptions

Posted in reply to Noelle125

Well, this will only take care of 1. there were incidence of '5'. 2.The last of group is not '5'.

data have;

infile cards;

input IncidentID    Year    Discscore    Empno;

cards;

1    2005    2    123

2    2006    5    123

3    2007    0.5    123

4    2008    1    123

5    2009    1.5    123

6    2001    2.5    456

7    2002    3.5    456

8    2003    5    456

9    2009    4    789

10    2010    5    789

11    2006    5    1011

;

data want (keep=empno);

do until (last.empno);

set have;

by empno year;

if first.empno then _f=0;

if discscore=5 then _f=1;

if last.empno then _f1=(discscore<5);

end;

if _f*_f1=1 then output;

run;

proc print;run;

Will look into more general approach. I hope this will get you started.

Regards,

Haikuo

Occasional Contributor
Posts: 12

Looking for exceptions

Thank you!  This will definitely get me started.  Again, I truly appreciate all of the help. 

Respected Advisor
Posts: 3,156

Looking for exceptions

Posted in reply to Noelle125

This will take care of those fluctuated cases as well, it will first check if there is '5', and if yes, then check if there is '<5' after within the same group.

data want (keep=empno);

do until (last.empno);

set have;

by empno year;

if discscore=5 then _f=1;

if _f=1 and discscore < 5 then _f1=1;

end;

if _f*_f1 then output;

run;

HTH,

Haikuo

Occasional Contributor
Posts: 12

Looking for exceptions

Wow!  Thank you!  

Respected Advisor
Posts: 4,937

Looking for exceptions

Posted in reply to Noelle125

A self join should do:

data have;

input incidentId year discScore EmpNo;

datalines;

1 2005 2 123

2 2006 5 123

3 2007 0.5 123

4 2008 1 123

5 2009 1.5 123

6 2001 2.5 456

7 2002 3.5 456

8 2003 5 456

9 2009 4 789

10 2010 5 789

11 2006 5 1011

;

proc sql;

create table want as

select unique a.empNo

from have as a inner join have as b

on a.empNo=b.empNo and a.year>b.year

where b.discScore>=5;

PG

PG
Occasional Contributor
Posts: 12

Looking for exceptions

Thank you.  This is very helpful as I know next to nothing about SQL.  I will try this right now.  

Super Contributor
Posts: 1,636

Looking for exceptions

Posted in reply to Noelle125

data have;

input incidentId year discScore EmpNo;

datalines;

1 2005 2 123

2 2006 5 123

3 2007 0.5 123

4 2008 1 123

5 2009 1.5 123

6 2001 2.5 456

7 2002 3.5 456

8 2003 5 456

9 2009 4 789

10 2010 5 789

11 2006 5 1011

;

proc sql;

create table want as

  select distinct a.empno

     from have  a, have  b

         where (a.empNo=b.empNo) and a.discscore>=5 and (a.year<b.year);

quit;

proc print;run;

Respected Advisor
Posts: 3,156

Looking for exceptions

PGstats and Linlin,

SQL works for OP's as-is data, however, multiple adjacent '5's within one group will also be pulled. Check this out:

data have;

input incidentId year discScore EmpNo;

datalines;

1 2005 2 123

2 2006 5 123

3 2007 0.5 123

4 2008 1 123

5 2009 1.5 123

6 2001 2.5 456

7 2002 3.5 456

8 2003 5 456

8.1 2004 5 456

9 2009 4 789

10 2010 5 789

11 2006 5 1011

;

in this case, '456' will also be pulled, even there is no '<5' existing after '5'.

Kindly Regards,

Haikuo

Respected Advisor
Posts: 4,937

Looking for exceptions

You are perfectly right Hai.kuo, there was a clause missing in my SQL query and the equivalent one from Linlin. As we fix this, we can also fix the problem of a sequence hitting discScore=5 many times, as illustrated below. I think the query would be most useful if it read :

data have;
input incidentId year discScore EmpNo;
datalines;
1 2005 2 123
2 2006 5 123
3 2007 0.5 123
4 2008 1 123
5 2009 1.5 123
6 2001 2.5 456
7 2002 3.5 456
8 2003 5 456
8.1 2004 5 456
9 2009 4 789
10 2010 5 789
10.1 2011 3 789
10.2 2012 5 789
10.3 2013 0.5 789
11 2006 5 1011
;

proc sql;
create table want as
select unique b.incidentId, b.year, b.empNo
from have as a inner join have as b
on a.empNo = b.empNo and
   a.year > b.year and
   a.discScore ne b.discScore
where b.discScore >= 5;

proc print; run;

Then every case where a series of scores hits 5 followed later by some other score would be pulled out. Given the problem statement, it is not clear if the extra condition should read a.discScore ne b.discScore (other score, as stated) or a.discScore < b.discScore (lower score, as was possibly meant). Of course, if the maximum score is 5, then both conditions are equivalent.

PG

PG
Super User
Posts: 10,048

Looking for exceptions

Posted in reply to Noelle125

If I understood what you mean.

data have;
input incidentId year discScore EmpNo;
datalines;
1 2005 2 123
2 2006 5 123
3 2007 0.5 123
4 2008 1 123
5 2009 1.5 123
6 2001 2.5 456
7 2002 3.5 456
8 2003 5 456
9 2004 5 456
10 2009 4 789
11 2010 5 789
12 2006 5 1011
;
run;
proc sql;
 create table want as
  select distinct empno
   from have as a
    where discscore=5
     having (select count(*) from have where empno=a.empno and year gt a.year) gt 0;
quit;


Ksharp

Ask a Question
Discussion stats
  • 14 replies
  • 255 views
  • 0 likes
  • 6 in conversation