Comparing repeated ID variables within a dataset

Reply
Occasional Contributor
Posts: 19

Comparing repeated ID variables within a dataset

I have a dataset with the following columns:
FiscalYear, EmployeeID, OverallRating

 

I need to find out what employees rated as Ineffective in one FiscalYear are rated in future or previous years.  For example, figuing out how to get the count of employees rated Ineffective in 2013 who were then rated anything else in 2014.

 

I've tried splitting up the data set by year and rating, like this:

PROC SQL;
SELECT COUNT (DISTINCT a.EmployeeID), a.FiscalYear, b.EmployeeID, b.FiscalYear, c.EmployeeID, c.FiscalYear
FROM SY1314_I AS a, SY1415_I AS b, SY1516_I AS c
WHERE c.EmployeeID NOT IN (SELECT EmployeeID FROM SY1314_I) AND NOT IN (SELECT EmployeeID FROM SY1415_I);

but end up with some error messages and no result.

 

I've tried breaking up the data by year and rating and comparing it with IF/THEN statements, like this: 

DATA NotI1415_1516 trash;
	SET SY1314_I SY1415_I SY1516_I;
	IF SY1314_I.EmployeeID = 'Ineffective' AND SY1314_I.EmployeeID NOT IN SY1415_I.EmployeeID AND SY1314.EmployeeID NOT IN SY1516_I.EmployeeID THEN OUTPUT NOTI1415_1516;
ELSE OUTPUT trash;

and also gotten nowhere.

 

Really not sure what to do, any help is appreciated.

Trusted Advisor
Posts: 1,459

Re: Comparing repeated ID variables within a dataset

You can't use in a data step a code like:

IF SY1314_I.EmployeeID = 'Ineffective' AND 
SY1314_I.EmployeeID NOT IN SY1415_I.EmployeeID AND ...

You can refer to a variable of current dataset only.

 

I tested your sql code and changed it a little. Try it now:

PROC SQL;
SELECT COUNT (DISTINCT a.EmployeeID), 
             a.FiscalYear, 
             b.EmployeeID, 
             b.FiscalYear, 
             c.EmployeeID, 
             c.FiscalYear
FROM SY1314_I AS a, 
     SY1415_I AS b, 
     SY1516_I AS c
WHERE c.EmployeeID 
  NOT IN (SELECT distinct EmployeeID FROM SY1314_I, SY1415_I);
quit;

 

 

Occasional Contributor
Posts: 19

Re: Comparing repeated ID variables within a dataset

Thanks for responding!  I ran your SQLcode and got "ERROR: Ambiguous reference, column EmployeeID is in more than one table."

 

I tried changing it so that it expicitly referenced a.EmployeeID and b.EmployeeID but then got the error that I can't reference 2 columns in a subquery.

Trusted Advisor
Posts: 1,459

Re: Comparing repeated ID variables within a dataset

I haven't noticed but there is no logic to select several employeeIDs especially not 

b.employeeID and c.employeeID as you are looking for those that are not there:

PROC SQL;
SELECT COUNT (DISTINCT a.EmployeeID), 
             a.FiscalYear, 
     /*        b.EmployeeID,  */
             b.FiscalYear, 
     /*        c.EmployeeID,  */
             c.FiscalYear
FROM SY1314_I AS a, 
     SY1415_I AS b, 
     SY1516_I AS c
WHERE c.EmployeeID 
  NOT IN (SELECT distinct EmployeeID FROM SY1314_I, SY1415_I);
quit;

Beyond, I think you should relate to @art297 remarks - and check your logic.

PROC Star
Posts: 7,431

Re: Comparing repeated ID variables within a dataset

I'm confused! You said you have 3 variables, but then you never reference the 3rd one (employee rating) but make one statement that leads one to think that employeeID is a rating.

 

Some example data and what you really want to achieve would definitely help.

 

Art, CEO, AnalystFinder.com

 

Super Contributor
Posts: 474

Re: Comparing repeated ID variables within a dataset

[ Edited ]

Hi.

 

Show us an example of the input data.

 

Could be as simple as selecting what you want for each year and intersect everything:

 

proc sql noprint;
create table WANT as

select EmployeeID from EMPLOYEES
where RATED eq 'Ineffective' and FiscalYear eq 2013

intersect

select EmployeeID from EMPLOYEES
where FiscalYear ne 2013

quit;

Daniel Santos @ www.cgd.pt

Ask a Question
Discussion stats
  • 5 replies
  • 155 views
  • 0 likes
  • 4 in conversation