BookmarkSubscribeRSS Feed
Noelle125
Obsidian | Level 7

I am struggling with what I believe is a fairly simple logic string. 

My data looks like this:

Incident IDYearDiscscoreEmp No.
120052123
220065123
320070.5123
420081123
520091.5123
620012.5456
720023.5456
820035456
920094789
1020105789
11200651011


I am looking for employees (EMP NO.)  where at some point they had a DISCSCORE of 5 but then in subsequent years, they received any other score.  So, drawing the example from my dataset above, the syntax would return Emp No. 123 because they had a discscore of 5 in 2006 and then received a 0.5 in 2007.  The syntax would NOT return Emp No.'s 456, 789 or 1011 because the last Discscore they received were 5s.  

Thanks in advance.     

14 REPLIES 14
Haikuo
Onyx | Level 15

What do you plan to do the cases with 5 then 0.5 then get back to 5 again?

Noelle125
Obsidian | Level 7

Good question.  That situation does exist in the dataset.  Ideally, those would be flagged as well but I believe those are truly rare and so if they, too, were returned, I would deal with them manually.  Obviously, it is easy to find all the 5s.  I'm just trying to not have to manually look through thousands of records to find those cases where an employee received a 5 and then received some other score later on.  

art297
Opal | Level 21

Here is a way to do it with a datastep:

proc sort data=have;

  by Emp_No Year;

run;

data want (keep=Emp_No);

  set have;

  by Emp_No;

  retain have5 want;

  if first.Emp_No then do;

    have5=0;

    want=0;

  end;

  if have5 and Discscore ne 5 then want=1;

  if Discscore eq 5 then have5=1;

  if last.Emp_no and have5 and want

    then output;

run;

Noelle125
Obsidian | Level 7

Thank you!  You have given me another thing to try.  I really appreciate this.  For whatever reason, I was really stumped by this seemingly simple issue.

Haikuo
Onyx | Level 15

Well, this will only take care of 1. there were incidence of '5'. 2.The last of group is not '5'.

data have;

infile cards;

input IncidentID    Year    Discscore    Empno;

cards;

1    2005    2    123

2    2006    5    123

3    2007    0.5    123

4    2008    1    123

5    2009    1.5    123

6    2001    2.5    456

7    2002    3.5    456

8    2003    5    456

9    2009    4    789

10    2010    5    789

11    2006    5    1011

;

data want (keep=empno);

do until (last.empno);

set have;

by empno year;

if first.empno then _f=0;

if discscore=5 then _f=1;

if last.empno then _f1=(discscore<5);

end;

if _f*_f1=1 then output;

run;

proc print;run;

Will look into more general approach. I hope this will get you started.

Regards,

Haikuo

Noelle125
Obsidian | Level 7

Thank you!  This will definitely get me started.  Again, I truly appreciate all of the help. 

Haikuo
Onyx | Level 15

This will take care of those fluctuated cases as well, it will first check if there is '5', and if yes, then check if there is '<5' after within the same group.

data want (keep=empno);

do until (last.empno);

set have;

by empno year;

if discscore=5 then _f=1;

if _f=1 and discscore < 5 then _f1=1;

end;

if _f*_f1 then output;

run;

HTH,

Haikuo

Noelle125
Obsidian | Level 7

Wow!  Thank you!  

PGStats
Opal | Level 21

A self join should do:

data have;

input incidentId year discScore EmpNo;

datalines;

1 2005 2 123

2 2006 5 123

3 2007 0.5 123

4 2008 1 123

5 2009 1.5 123

6 2001 2.5 456

7 2002 3.5 456

8 2003 5 456

9 2009 4 789

10 2010 5 789

11 2006 5 1011

;

proc sql;

create table want as

select unique a.empNo

from have as a inner join have as b

on a.empNo=b.empNo and a.year>b.year

where b.discScore>=5;

PG

PG
Noelle125
Obsidian | Level 7

Thank you.  This is very helpful as I know next to nothing about SQL.  I will try this right now.  

Linlin
Lapis Lazuli | Level 10

data have;

input incidentId year discScore EmpNo;

datalines;

1 2005 2 123

2 2006 5 123

3 2007 0.5 123

4 2008 1 123

5 2009 1.5 123

6 2001 2.5 456

7 2002 3.5 456

8 2003 5 456

9 2009 4 789

10 2010 5 789

11 2006 5 1011

;

proc sql;

create table want as

  select distinct a.empno

     from have  a, have  b

         where (a.empNo=b.empNo) and a.discscore>=5 and (a.year<b.year);

quit;

proc print;run;

Haikuo
Onyx | Level 15

PGstats and Linlin,

SQL works for OP's as-is data, however, multiple adjacent '5's within one group will also be pulled. Check this out:

data have;

input incidentId year discScore EmpNo;

datalines;

1 2005 2 123

2 2006 5 123

3 2007 0.5 123

4 2008 1 123

5 2009 1.5 123

6 2001 2.5 456

7 2002 3.5 456

8 2003 5 456

8.1 2004 5 456

9 2009 4 789

10 2010 5 789

11 2006 5 1011

;

in this case, '456' will also be pulled, even there is no '<5' existing after '5'.

Kindly Regards,

Haikuo

PGStats
Opal | Level 21

You are perfectly right Hai.kuo, there was a clause missing in my SQL query and the equivalent one from Linlin. As we fix this, we can also fix the problem of a sequence hitting discScore=5 many times, as illustrated below. I think the query would be most useful if it read :

data have;
input incidentId year discScore EmpNo;
datalines;
1 2005 2 123
2 2006 5 123
3 2007 0.5 123
4 2008 1 123
5 2009 1.5 123
6 2001 2.5 456
7 2002 3.5 456
8 2003 5 456
8.1 2004 5 456
9 2009 4 789
10 2010 5 789
10.1 2011 3 789
10.2 2012 5 789
10.3 2013 0.5 789
11 2006 5 1011
;

proc sql;
create table want as
select unique b.incidentId, b.year, b.empNo
from have as a inner join have as b
on a.empNo = b.empNo and
   a.year > b.year and
   a.discScore ne b.discScore
where b.discScore >= 5;

proc print; run;

Then every case where a series of scores hits 5 followed later by some other score would be pulled out. Given the problem statement, it is not clear if the extra condition should read a.discScore ne b.discScore (other score, as stated) or a.discScore < b.discScore (lower score, as was possibly meant). Of course, if the maximum score is 5, then both conditions are equivalent.

PG

PG
Ksharp
Super User

If I understood what you mean.

data have;
input incidentId year discScore EmpNo;
datalines;
1 2005 2 123
2 2006 5 123
3 2007 0.5 123
4 2008 1 123
5 2009 1.5 123
6 2001 2.5 456
7 2002 3.5 456
8 2003 5 456
9 2004 5 456
10 2009 4 789
11 2010 5 789
12 2006 5 1011
;
run;
proc sql;
 create table want as
  select distinct empno
   from have as a
    where discscore=5
     having (select count(*) from have where empno=a.empno and year gt a.year) gt 0;
quit;


Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1439 views
  • 0 likes
  • 6 in conversation