Desktop productivity for business analysts and programmers

Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 80
Accepted Solution

Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS


Hi,

I'm trying to find the Function in SAS that will perform a sort on my Dataset of Job Roles & Start Dates.

I'm using Enterprise Guide 4.3 & I want to do this sorting in an Advanced Expression, that is via The Query Builder >> 'Computed Columns' >> 'New' >> 'Advanced Expression'

The Dataset has various Job Roles staff have had & the various Start Dates, there are multiple Job Roles & Start Dates for each Staff member.

I want to find the latest Start Date & Job Role for each Staff member.

In Teradata i would use the below;

------------------------------------------------------

Select

Job_Role,

Start_Date,

ROW_NUMBER OVER (PARTITION BY 'Job_Role' ORDER BY Start_Date DESC) AS Rank_No

From Data_Table

------------------------------------------------------

I could then filter the records for where the 'Rank_No' = 1 which would give the most current Job Role for that Staff member.

What can I use in SAS to do a similar task?

Thanks


Accepted Solutions
Solution
‎07-25-2013 02:53 AM
SAS Super FREQ
Posts: 676

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

SAS Enterprise Guide also provides a task to do ranking under Tasks -> Data -> Rank

You will use start_date as "column to rank" and job_role as "rank by"

Unter options you check "Reverse Ranking from largest to smallest" so the youngest date gets the ranking 1.

You can then use a query to select only those with ranking value 1

View solution in original post


All Replies
Grand Advisor
Posts: 17,394

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Use a datastep and first/last processing is the easiest way via program.

First sort by staff member, job date descending and job role

data want;

set have;

by staff desc job_date job_role;

if first.staff then output;

run;

There are multiple ways to do this though, another is a sql join with a subselect of the maximum job_date/job_role per staff member. Another is to rank the dates using SAS Rank functions and then take the top 1. Both of the above can be done using EG point and click.

Frequent Contributor
Posts: 80

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Thanks Reeza,

So I would be entering that code into a new Program?

Where is the reference to the source table?

Thanks

Grand Advisor
Posts: 17,394

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Actually the EG solution is easy, I'm just used to thinking in Programming Steps...

Use a Data>Sort Data step to sort your data first

By Member, Date, Job Role

Make sure to set date descending.

Then Sort the Output dataset again (Data>Sort Data)

But only by Member.

And only Options>Duplicate - select the option to keep only the first record.

Frequent Contributor
Posts: 80

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Hi Reeza,

I'm using EG 4.3 version, it doesn't have the option of 'Options>Duplicate' - select the option to keep only the first record.


Grand Advisor
Posts: 17,394

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

I'm using 4.2 and it does.

Tasks>Data>Sort Data

Left hand side > Options

Duplicate is in the second set of options. It could have been removed in 4.3 I suppose...

SAS EG Options.jpg

Frequent Contributor
Posts: 129

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

I see the same window as REEza in EG 4.3.0

Frequent Contributor
Posts: 80

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

You're right Larry, I was in the wrong task icon, Rank instead of Sort... DUH !!:smileysilly: 

Frequent Contributor
Posts: 80

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Sorry Reeza, you're right, I was in a 'Rank' Icon instead of a Sort Icon.

My screen looks exactly like yours & does have the Duplicates option area.

Under 'Data', I've added the Start_Date filed in the 'Sort By' area descending & under 'Options' I have selected the 'Keep only the first record for each 'Sort By' group' option.

My results are showing more than one date for the Job_Role though?

Is it correct to just add one field in the Task Roles - Sort By area?

Also, what is 'Columns to be dropped for?

Thanks    

Grand Advisor
Posts: 17,394

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

I don't quite understand your specific requirements so my suggestion would be to try a bunch of different things until you get what you want....scientific, I know. But you probably have a better idea.

The logis is to sort first on your key fields and then to sort only on what you want to be the UNIQUE key and everything else that is duplicate will be winnowed out. 

Columns to be dropped probably allows you to not include all variables in your final dataset, for example if you only wanted unique members.

Frequent Contributor
Posts: 80

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

OK, thanks Reeza,

I'm doing that now to see how I can just get the row that has the latest Start_Date.

Thanks for your help.

Frequent Contributor
Posts: 129

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Reeza is correct in pointing your towards datastep, since SAS SQL does not support windowing sql functions.

I believe there is an alternative sql formulation which will generate the same result as the partition by.  You could try this:

Select

Staff

Job_Role,

max(Start_Date) as start_date

From Data_Table

group by staff

having start_date = max(start_date)

;

Or you could use select distinct and omit the having clause.

You might be able to implement that in the query builder.

Solution
‎07-25-2013 02:53 AM
SAS Super FREQ
Posts: 676

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

SAS Enterprise Guide also provides a task to do ranking under Tasks -> Data -> Rank

You will use start_date as "column to rank" and job_role as "rank by"

Unter options you check "Reverse Ranking from largest to smallest" so the youngest date gets the ranking 1.

You can then use a query to select only those with ranking value 1

Frequent Contributor
Posts: 80

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Hi Bruno,

I wasn't aware of this before, thank you.

I'm getting some strange numbers in the Rank result.

I'm not getting a 1, 2 3 in the ranking which is what I was expecting.

I noticed I have the below options when I add the Rank node to the graph as below;

  • Sort By Variables under the Data area
  • Ranking Method under the Options area
  • If values tie, use;

What shoud I choose for these to get a 1,2,3 Ranking?

Thanks

SAS Super FREQ
Posts: 676

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

pcfriendly

use the defaults for the data and assign the roles as stated earlier

the Ranking method should be "smallest to largest"

check the box "Reverse Ranking from smallest to largest"

leave the default for "if values tie"

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 18438 views
  • 3 likes
  • 6 in conversation