I am new to SAS and have had no training, so please be specific in your response.
I have a table of 46,610 Terminated employees, but some employee’s are listed multiple times with different Termination Dates due to rehire situations.
I only want the record with the most recent Termination Date.
I've tried for hours to write a procedure, but just haven't been able to figure out how. Any help will be appreciated.
I've created the SAS table from a tab-limited file. Here's my code to create the table.
I've sorted the table by Person_Number and Termination_Date (in decending order)
Now I need the code to select only the unique Person_Number records with the most recent Termination_Date.
All fields are required in the output.
Thanks in advance for your help.
proc sql;
create table ORtmpTerm as
select distinct
Person_Number,
Full_Name,
Preferred_Name,
Standard_ID,
Oracle_ID,
CAI,
Global_Type,
Worker_Type,
Person_Type,
Worker_Type_Detail,
Workforce_Category,
AssignStatus,
Assignment_EffStartDt,
Leave_of_Absence_Start_Date,
Full_Time_Part_Time,
Regular_Temporary,
Corp_Title,
Corp_Title_Grp,
Corp_Title_Grp_EffDate,
Band,
Band_Effective_Date,
Job_Code,
Job_Code_Effective_Date,
Job_Name,
Job_Family_Name,
Job_Sub_Family_Name,
FLSA,
OTL_Timekeeping_Status,
Advisor_Category,
Job_Registration,
Job_Revenue_Category,
Commission_Flag,
Benefit_Population_Indicator,
Work_Style,
Work_at_Home_Flag,
Flexible_Work_Options,
Scheduled_Hours,
Scheduled_Hours_Effective_Date,
Shift_Description,
Shift_Effective_Date,
Work_Location_Geo_Tier,
Currency,
Currency_Code,
RMS_Fx_Rate,
Number_of_Pay_Months,
Payroll_Name,
Recent_What_Rating_Review_Type,
Recent_What_Rating_Review_Year,
Recent_How_Rating_Review_Type,
Recent_How_Rating_Review_Year,
Contract_Start_Date,
Contract_End_Date,
Functional_Job_Title,
Data_Source,
Contractor_Supplier_Company,
Special_Category,
Expat_Indicator,
Expat_Type_Code,
Expat_Type_Description,
Territorial_Type,
Manager_with_Direct_Reports_Flag,
Original_Hire_Date,
Recent_Hire_Date,
Rehire_Date,
Continuous_Service_Date,
Tenure_Continuous_Service,
Tenure_Since_Recent_Hire,
Tenure_Range,
Headcount,
Scheduled_FTE,
Manager_Name,
Manager_Person_Number,
Manager_eMail_Address,
Manager_Line_of_Business_Name,
Manager_10Dot_Hierarchy_Code,
Work_Phone,
Work_Phone_Secondary,
Work_eMail_Address,
Work_Mail_Code,
Work_Location_Code,
Work_Location_Building,
Work_Location_Address_Line_1,
Work_Location_Address_Line_2,
Work_Location_Address_Line_3,
Primary_Work_City,
Primary_Work_County,
Primary_Work_State,
Primary_Work_Postal_Code,
Primary_Work_Country,
Work_Location_Geographic_Region,
Primary_Work_Market_Name,
Line_of_Business_Name,
Sub_Line_of_Business_Name,
Division_Name,
Sub_Division_Name,
Group_Name,
Sub_Group_Name,
Summary_Name,
Legal_Entity_Name,
Legal_Employer,
Company_Number,
Co_Cntr_Code,
Co_Cntr_Name,
Co_Cntr_Code_EffDate,
ML_Cost_Center_Code,
ML_Cost_Center_Long_Name,
HYcode_01Dot,
HYname_01Dot,
HYcode_02Dot,
HYname_02Dot,
HYcode_03Dot,
HYname_03Dot,
HYcode_04Dot,
HYname_04Dot,
HYcode_05Dot,
HYname_05Dot,
HYcode_06Dot,
HYname_06Dot,
HYcode_07Dot,
HYname_07Dot,
HYcode_08Dot,
HYname_08Dot,
HYcode_10Dot,
HYname_10Dot,
Management_Level,
Manager_Level_1,
Manager_Level_2,
Manager_Level_3,
Manager_Level_4,
Manager_Level_5,
Manager_Level_6,
Manager_Level_7,
Manager_Level_8,
Manager_Level_9,
Manager_Level_10,
Manager_Level_11,
Manager_Level_12,
Manager_Level_13,
Manager_Level_14,
Manager_Level_15,
Co_Cntr_Finance_Manager_Name,
Co_Cntr_HR_Manager_Name,
Co_Cntr_Org_Manager_Name,
BA_Continuum_Flag,
Termination_Date
from ORwfdTerm
quit;
Sounds simple :
proc sql;
create table ORtmpTerm as
select *
from ORwfdTerm
group by Person_Number
having Termination_Date = max(Termination_Date);
quit;
PG
Sounds simple :
proc sql;
create table ORtmpTerm as
select *
from ORwfdTerm
group by Person_Number
having Termination_Date = max(Termination_Date);
quit;
PG
It looks like it worked, but do I need to be concerned with the message "NOTE: The query requires remerging summary statistics back with the original data"?
2136 Proc Sort Data = ORwfdTerm;
2137 By Person_Number decending Termination_Date;
2138 Run;
NOTE: There were 46610 observations read from the data set WORK.ORWFDTERM.
NOTE: The data set WORK.ORWFDTERM has 46610 observations and 173 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 2.04 seconds
cpu time 1.34 seconds
2139 /*
2140 proc sql;
2141 select *
2142 from ORwfdTerm
2143 group by Person_Number
2144 having max(Termination_Date);
2145 quit;
2146 */
2147 /* Select Max Date */
2148 proc sql;
2149 create table ORtmpTerm2 as
2150 select *
2151 from ORwfdTerm
2152 group by Person_Number
2153 having Termination_Date = max(Termination_Date);
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.ORTMPTERM2 created, with 45681 rows and 173 columns.
No, in this case, remerging is not a problem, it is expected. - PG
Without sample data hard to suggest. Try this
proc sql;
create table want as
select * from ORwfdTerm
group by Person_Number
having Termination_Date=max(Termination_Date);
quit;
Hi , you can also use the first. last. approach as below .
proc sort data = ORwfdTerm ;
By Person_Number decending Termination_Date;
run;
data want ;
set have ;
By Person_Number decending Termination_Date;;
if first.Person_Number ;
run;
thanks
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.