I have a dataset with the following columns:
FiscalYear, EmployeeID, OverallRating
I need to find out what employees rated as Ineffective in one FiscalYear are rated in future or previous years. For example, figuing out how to get the count of employees rated Ineffective in 2013 who were then rated anything else in 2014.
I've tried splitting up the data set by year and rating, like this:
PROC SQL;
SELECT COUNT (DISTINCT a.EmployeeID), a.FiscalYear, b.EmployeeID, b.FiscalYear, c.EmployeeID, c.FiscalYear
FROM SY1314_I AS a, SY1415_I AS b, SY1516_I AS c
WHERE c.EmployeeID NOT IN (SELECT EmployeeID FROM SY1314_I) AND NOT IN (SELECT EmployeeID FROM SY1415_I);
but end up with some error messages and no result.
I've tried breaking up the data by year and rating and comparing it with IF/THEN statements, like this:
DATA NotI1415_1516 trash;
SET SY1314_I SY1415_I SY1516_I;
IF SY1314_I.EmployeeID = 'Ineffective' AND SY1314_I.EmployeeID NOT IN SY1415_I.EmployeeID AND SY1314.EmployeeID NOT IN SY1516_I.EmployeeID THEN OUTPUT NOTI1415_1516;
ELSE OUTPUT trash;
and also gotten nowhere.
Really not sure what to do, any help is appreciated.
You can't use in a data step a code like:
IF SY1314_I.EmployeeID = 'Ineffective' AND
SY1314_I.EmployeeID NOT IN SY1415_I.EmployeeID AND ...
You can refer to a variable of current dataset only.
I tested your sql code and changed it a little. Try it now:
PROC SQL;
SELECT COUNT (DISTINCT a.EmployeeID),
a.FiscalYear,
b.EmployeeID,
b.FiscalYear,
c.EmployeeID,
c.FiscalYear
FROM SY1314_I AS a,
SY1415_I AS b,
SY1516_I AS c
WHERE c.EmployeeID
NOT IN (SELECT distinct EmployeeID FROM SY1314_I, SY1415_I);
quit;
Thanks for responding! I ran your SQLcode and got "ERROR: Ambiguous reference, column EmployeeID is in more than one table."
I tried changing it so that it expicitly referenced a.EmployeeID and b.EmployeeID but then got the error that I can't reference 2 columns in a subquery.
I haven't noticed but there is no logic to select several employeeIDs especially not
b.employeeID and c.employeeID as you are looking for those that are not there:
PROC SQL;
SELECT COUNT (DISTINCT a.EmployeeID),
a.FiscalYear,
/* b.EmployeeID, */
b.FiscalYear,
/* c.EmployeeID, */
c.FiscalYear
FROM SY1314_I AS a,
SY1415_I AS b,
SY1516_I AS c
WHERE c.EmployeeID
NOT IN (SELECT distinct EmployeeID FROM SY1314_I, SY1415_I);
quit;
Beyond, I think you should relate to @art297 remarks - and check your logic.
I'm confused! You said you have 3 variables, but then you never reference the 3rd one (employee rating) but make one statement that leads one to think that employeeID is a rating.
Some example data and what you really want to achieve would definitely help.
Art, CEO, AnalystFinder.com
Hi.
Show us an example of the input data.
Could be as simple as selecting what you want for each year and intersect everything:
proc sql noprint;
create table WANT as
select EmployeeID from EMPLOYEES
where RATED eq 'Ineffective' and FiscalYear eq 2013
intersect
select EmployeeID from EMPLOYEES
where FiscalYear ne 2013
quit;
Daniel Santos @ www.cgd.pt
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.