Hi,
I'm trying to find the Function in SAS that will perform a sort on my Dataset of Job Roles & Start Dates.
I'm using Enterprise Guide 4.3 & I want to do this sorting in an Advanced Expression, that is via The Query Builder >> 'Computed Columns' >> 'New' >> 'Advanced Expression'
The Dataset has various Job Roles staff have had & the various Start Dates, there are multiple Job Roles & Start Dates for each Staff member.
I want to find the latest Start Date & Job Role for each Staff member.
In Teradata i would use the below;
------------------------------------------------------
Select
Job_Role,
Start_Date,
ROW_NUMBER OVER (PARTITION BY 'Job_Role' ORDER BY Start_Date DESC) AS Rank_No
From Data_Table
------------------------------------------------------
I could then filter the records for where the 'Rank_No' = 1 which would give the most current Job Role for that Staff member.
What can I use in SAS to do a similar task?
Thanks
SAS Enterprise Guide also provides a task to do ranking under Tasks -> Data -> Rank
You will use start_date as "column to rank" and job_role as "rank by"
Unter options you check "Reverse Ranking from largest to smallest" so the youngest date gets the ranking 1.
You can then use a query to select only those with ranking value 1
Use a datastep and first/last processing is the easiest way via program.
First sort by staff member, job date descending and job role
data want;
set have;
by staff desc job_date job_role;
if first.staff then output;
run;
There are multiple ways to do this though, another is a sql join with a subselect of the maximum job_date/job_role per staff member. Another is to rank the dates using SAS Rank functions and then take the top 1. Both of the above can be done using EG point and click.
Thanks Reeza,
So I would be entering that code into a new Program?
Where is the reference to the source table?
Thanks
Actually the EG solution is easy, I'm just used to thinking in Programming Steps...
Use a Data>Sort Data step to sort your data first
By Member, Date, Job Role
Make sure to set date descending.
Then Sort the Output dataset again (Data>Sort Data)
But only by Member.
And only Options>Duplicate - select the option to keep only the first record.
Hi Reeza,
I'm using EG 4.3 version, it doesn't have the option of 'Options>Duplicate' - select the option to keep only the first record.
I'm using 4.2 and it does.
Tasks>Data>Sort Data
Left hand side > Options
Duplicate is in the second set of options. It could have been removed in 4.3 I suppose...
I see the same window as REEza in EG 4.3.0
You're right Larry, I was in the wrong task icon, Rank instead of Sort... DUH !!:smileysilly:
Sorry Reeza, you're right, I was in a 'Rank' Icon instead of a Sort Icon.
My screen looks exactly like yours & does have the Duplicates option area.
Under 'Data', I've added the Start_Date filed in the 'Sort By' area descending & under 'Options' I have selected the 'Keep only the first record for each 'Sort By' group' option.
My results are showing more than one date for the Job_Role though?
Is it correct to just add one field in the Task Roles - Sort By area?
Also, what is 'Columns to be dropped for?
Thanks
I don't quite understand your specific requirements so my suggestion would be to try a bunch of different things until you get what you want....scientific, I know. But you probably have a better idea.
The logis is to sort first on your key fields and then to sort only on what you want to be the UNIQUE key and everything else that is duplicate will be winnowed out.
Columns to be dropped probably allows you to not include all variables in your final dataset, for example if you only wanted unique members.
OK, thanks Reeza,
I'm doing that now to see how I can just get the row that has the latest Start_Date.
Thanks for your help.
Reeza is correct in pointing your towards datastep, since SAS SQL does not support windowing sql functions.
I believe there is an alternative sql formulation which will generate the same result as the partition by. You could try this:
Select
Staff
Job_Role,
max(Start_Date) as start_date
From Data_Table
group by staff
having start_date = max(start_date)
;
Or you could use select distinct and omit the having clause.
You might be able to implement that in the query builder.
SAS Enterprise Guide also provides a task to do ranking under Tasks -> Data -> Rank
You will use start_date as "column to rank" and job_role as "rank by"
Unter options you check "Reverse Ranking from largest to smallest" so the youngest date gets the ranking 1.
You can then use a query to select only those with ranking value 1
Hi Bruno,
I wasn't aware of this before, thank you.
I'm getting some strange numbers in the Rank result.
I'm not getting a 1, 2 3 in the ranking which is what I was expecting.
I noticed I have the below options when I add the Rank node to the graph as below;
What shoud I choose for these to get a 1,2,3 Ranking?
Thanks
pcfriendly
use the defaults for the data and assign the roles as stated earlier
the Ranking method should be "smallest to largest"
check the box "Reverse Ranking from smallest to largest"
leave the default for "if values tie"
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.