I am new to SAS and have had no training, so please be specific in your response.
I have a table of 46,610 Terminated employees, but some employee’s are listed multiple times with different Termination Dates due to rehire situations.
I only want the record with the most recent Termination Date.
I've tried for hours to write a procedure, but just haven't been able to figure out how. Any help will be appreciated.
I've created the SAS table from a tab-limited file. Here's my code to create the table.
I've sorted the table by Person_Number and Termination_Date (in decending order)
Now I need the code to select only the unique Person_Number records with the most recent Termination_Date.
All fields are required in the output.
Thanks in advance for your help.
proc sql;
create table ORtmpTerm as
select distinct
Person_Number,
Full_Name,
Preferred_Name,
Standard_ID,
Oracle_ID,
CAI,
Global_Type,
Worker_Type,
Person_Type,
Worker_Type_Detail,
Workforce_Category,
AssignStatus,
Assignment_EffStartDt,
Leave_of_Absence_Start_Date,
Full_Time_Part_Time,
Regular_Temporary,
Corp_Title,
Corp_Title_Grp,
Corp_Title_Grp_EffDate,
Band,
Band_Effective_Date,
Job_Code,
Job_Code_Effective_Date,
Job_Name,
Job_Family_Name,
Job_Sub_Family_Name,
FLSA,
OTL_Timekeeping_Status,
Advisor_Category,
Job_Registration,
Job_Revenue_Category,
Commission_Flag,
Benefit_Population_Indicator,
Work_Style,
Work_at_Home_Flag,
Flexible_Work_Options,
Scheduled_Hours,
Scheduled_Hours_Effective_Date,
Shift_Description,
Shift_Effective_Date,
Work_Location_Geo_Tier,
Currency,
Currency_Code,
RMS_Fx_Rate,
Number_of_Pay_Months,
Payroll_Name,
Recent_What_Rating_Review_Type,
Recent_What_Rating_Review_Year,
Recent_How_Rating_Review_Type,
Recent_How_Rating_Review_Year,
Contract_Start_Date,
Contract_End_Date,
Functional_Job_Title,
Data_Source,
Contractor_Supplier_Company,
Special_Category,
Expat_Indicator,
Expat_Type_Code,
Expat_Type_Description,
Territorial_Type,
Manager_with_Direct_Reports_Flag,
Original_Hire_Date,
Recent_Hire_Date,
Rehire_Date,
Continuous_Service_Date,
Tenure_Continuous_Service,
Tenure_Since_Recent_Hire,
Tenure_Range,
Headcount,
Scheduled_FTE,
Manager_Name,
Manager_Person_Number,
Manager_eMail_Address,
Manager_Line_of_Business_Name,
Manager_10Dot_Hierarchy_Code,
Work_Phone,
Work_Phone_Secondary,
Work_eMail_Address,
Work_Mail_Code,
Work_Location_Code,
Work_Location_Building,
Work_Location_Address_Line_1,
Work_Location_Address_Line_2,
Work_Location_Address_Line_3,
Primary_Work_City,
Primary_Work_County,
Primary_Work_State,
Primary_Work_Postal_Code,
Primary_Work_Country,
Work_Location_Geographic_Region,
Primary_Work_Market_Name,
Line_of_Business_Name,
Sub_Line_of_Business_Name,
Division_Name,
Sub_Division_Name,
Group_Name,
Sub_Group_Name,
Summary_Name,
Legal_Entity_Name,
Legal_Employer,
Company_Number,
Co_Cntr_Code,
Co_Cntr_Name,
Co_Cntr_Code_EffDate,
ML_Cost_Center_Code,
ML_Cost_Center_Long_Name,
HYcode_01Dot,
HYname_01Dot,
HYcode_02Dot,
HYname_02Dot,
HYcode_03Dot,
HYname_03Dot,
HYcode_04Dot,
HYname_04Dot,
HYcode_05Dot,
HYname_05Dot,
HYcode_06Dot,
HYname_06Dot,
HYcode_07Dot,
HYname_07Dot,
HYcode_08Dot,
HYname_08Dot,
HYcode_10Dot,
HYname_10Dot,
Management_Level,
Manager_Level_1,
Manager_Level_2,
Manager_Level_3,
Manager_Level_4,
Manager_Level_5,
Manager_Level_6,
Manager_Level_7,
Manager_Level_8,
Manager_Level_9,
Manager_Level_10,
Manager_Level_11,
Manager_Level_12,
Manager_Level_13,
Manager_Level_14,
Manager_Level_15,
Co_Cntr_Finance_Manager_Name,
Co_Cntr_HR_Manager_Name,
Co_Cntr_Org_Manager_Name,
BA_Continuum_Flag,
Termination_Date
from ORwfdTerm
quit;
Sounds simple :
proc sql;
create table ORtmpTerm as
select *
from ORwfdTerm
group by Person_Number
having Termination_Date = max(Termination_Date);
quit;
PG
Sounds simple :
proc sql;
create table ORtmpTerm as
select *
from ORwfdTerm
group by Person_Number
having Termination_Date = max(Termination_Date);
quit;
PG
It looks like it worked, but do I need to be concerned with the message "NOTE: The query requires remerging summary statistics back with the original data"?
2136 Proc Sort Data = ORwfdTerm;
2137 By Person_Number decending Termination_Date;
2138 Run;
NOTE: There were 46610 observations read from the data set WORK.ORWFDTERM.
NOTE: The data set WORK.ORWFDTERM has 46610 observations and 173 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 2.04 seconds
cpu time 1.34 seconds
2139 /*
2140 proc sql;
2141 select *
2142 from ORwfdTerm
2143 group by Person_Number
2144 having max(Termination_Date);
2145 quit;
2146 */
2147 /* Select Max Date */
2148 proc sql;
2149 create table ORtmpTerm2 as
2150 select *
2151 from ORwfdTerm
2152 group by Person_Number
2153 having Termination_Date = max(Termination_Date);
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.ORTMPTERM2 created, with 45681 rows and 173 columns.
No, in this case, remerging is not a problem, it is expected. - PG
Without sample data hard to suggest. Try this
proc sql;
create table want as
select * from ORwfdTerm
group by Person_Number
having Termination_Date=max(Termination_Date);
quit;
Hi , you can also use the first. last. approach as below .
proc sort data = ORwfdTerm ;
By Person_Number decending Termination_Date;
run;
data want ;
set have ;
By Person_Number decending Termination_Date;;
if first.Person_Number ;
run;
thanks
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.