SAS Enterprise Guide

pcfriendly · Posted 07-24-2013 05:38 PM

Hi,

I'm trying to find the Function in SAS that will perform a sort on my Dataset of Job Roles & Start Dates.

I'm using Enterprise Guide 4.3 & I want to do this sorting in an Advanced Expression, that is via The Query Builder >> 'Computed Columns' >> 'New' >> 'Advanced Expression'

The Dataset has various Job Roles staff have had & the various Start Dates, there are multiple Job Roles & Start Dates for each Staff member.

I want to find the latest Start Date & Job Role for each Staff member.

In Teradata i would use the below;

------------------------------------------------------

Select

Job_Role,

Start_Date,

ROW_NUMBER OVER (PARTITION BY 'Job_Role' ORDER BY Start_Date DESC) AS Rank_No

From Data_Table

------------------------------------------------------

I could then filter the records for where the 'Rank_No' = 1 which would give the most current Job Role for that Staff member.

What can I use in SAS to do a similar task?

Thanks

BrunoMueller · Posted 07-25-2013 02:53 AM

SAS Enterprise Guide also provides a task to do ranking under Tasks -> Data -> Rank

You will use start_date as "column to rank" and job_role as "rank by"

Unter options you check "Reverse Ranking from largest to smallest" so the youngest date gets the ranking 1.

You can then use a query to select only those with ranking value 1

View solution in original post

Reeza · Posted 07-24-2013 06:08 PM

Use a datastep and first/last processing is the easiest way via program.

First sort by staff member, job date descending and job role

data want;

set have;

by staff desc job_date job_role;

if first.staff then output;

run;

There are multiple ways to do this though, another is a sql join with a subselect of the maximum job_date/job_role per staff member. Another is to rank the dates using SAS Rank functions and then take the top 1. Both of the above can be done using EG point and click.

pcfriendly · Posted 07-24-2013 06:16 PM

Thanks Reeza,

So I would be entering that code into a new Program?

Where is the reference to the source table?

Thanks

Reeza · Posted 07-24-2013 06:33 PM

Actually the EG solution is easy, I'm just used to thinking in Programming Steps...

Use a Data>Sort Data step to sort your data first

By Member, Date, Job Role

Make sure to set date descending.

Then Sort the Output dataset again (Data>Sort Data)

But only by Member.

And only Options>Duplicate - select the option to keep only the first record.

pcfriendly · Posted 07-25-2013 05:49 PM

Hi Reeza,

I'm using EG 4.3 version, it doesn't have the option of 'Options>Duplicate' - select the option to keep only the first record.

Reeza · Posted 07-25-2013 05:56 PM

I'm using 4.2 and it does.

Tasks>Data>Sort Data

Left hand side > Options

Duplicate is in the second set of options. It could have been removed in 4.3 I suppose...

LarryWorley · Posted 07-25-2013 06:07 PM

I see the same window as REEza in EG 4.3.0

pcfriendly · Posted 07-25-2013 06:14 PM

You're right Larry, I was in the wrong task icon, Rank instead of Sort... DUH !!:smileysilly:

pcfriendly · Posted 07-25-2013 06:13 PM

Sorry Reeza, you're right, I was in a 'Rank' Icon instead of a Sort Icon.

My screen looks exactly like yours & does have the Duplicates option area.

Under 'Data', I've added the Start_Date filed in the 'Sort By' area descending & under 'Options' I have selected the 'Keep only the first record for each 'Sort By' group' option.

My results are showing more than one date for the Job_Role though?

Is it correct to just add one field in the Task Roles - Sort By area?

Also, what is 'Columns to be dropped for?

Thanks

Reeza · Posted 07-25-2013 06:18 PM

I don't quite understand your specific requirements so my suggestion would be to try a bunch of different things until you get what you want....scientific, I know. But you probably have a better idea.

The logis is to sort first on your key fields and then to sort only on what you want to be the UNIQUE key and everything else that is duplicate will be winnowed out.

Columns to be dropped probably allows you to not include all variables in your final dataset, for example if you only wanted unique members.

pcfriendly · Posted 07-25-2013 06:21 PM

OK, thanks Reeza,

I'm doing that now to see how I can just get the row that has the latest Start_Date.

Thanks for your help.

LarryWorley · Posted 07-24-2013 06:38 PM

Reeza is correct in pointing your towards datastep, since SAS SQL does not support windowing sql functions.

I believe there is an alternative sql formulation which will generate the same result as the partition by. You could try this:

Select

Staff

Job_Role,

max(Start_Date) as start_date

From Data_Table

group by staff

having start_date = max(start_date)

;

Or you could use select distinct and omit the having clause.

You might be able to implement that in the query builder.

BrunoMueller · Posted 07-25-2013 02:53 AM

SAS Enterprise Guide also provides a task to do ranking under Tasks -> Data -> Rank

You will use start_date as "column to rank" and job_role as "rank by"

Unter options you check "Reverse Ranking from largest to smallest" so the youngest date gets the ranking 1.

You can then use a query to select only those with ranking value 1

pcfriendly · Posted 07-25-2013 05:48 AM

Hi Bruno,

I wasn't aware of this before, thank you.

I'm getting some strange numbers in the Rank result.

I'm not getting a 1, 2 3 in the ranking which is what I was expecting.

I noticed I have the below options when I add the Rank node to the graph as below;

Sort By Variables under the Data area
Ranking Method under the Options area
If values tie, use;

What shoud I choose for these to get a 1,2,3 Ranking?

Thanks

BrunoMueller · Posted 07-25-2013 07:19 AM

pcfriendly

use the defaults for the data and assign the roles as stated earlier

the Ranking method should be "smallest to largest"

check the box "Reverse Ranking from smallest to largest"

leave the default for "if values tie"

SAS Enterprise Guide

Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Re: Find Latest value using ROW_NUMBER OVER (PARTITION BY 'X' ORDER BY DATE) AS Rank in SAS

Follow Us

What is...

SAS Enterprise Guide

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...