BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I've got a stored process that has a Proc SQL step that uses several tables. The data from this Proc Sql creates an output that contains several data records per person (not duplicates). These several records per person give data about the persons assigned jobs, including when their assigned jobs end. These people can have multiple assigned jobs (i.e., Jan 1, 2005 to Mar 1, 2005: Teacher, Mar 2, 2005 - Sep. 1 2006, Professor, Sep2, 2006 - sep 3, 2007 - Dean).

In my case, I only want to retrieve a person's last position information. Would I therefore use a Proc Sort after the Proc SQL, or in the Proc SQL itself, group on the person's assigned job's end date, and then use a data step to get the person's assigned job's end date?

Would something like the following pseudocode work?

Proc SQL;
create table step01
....
....
group by assigned job's end date;
Quit;

Proc sort data = step01;
by assigned job's end date;

Data finaloutput;
set step01;
by assigned job's end date;
if first assigned job's end date = 1;
run;

It just seems on the sort and data steps that there is repetition of BY information.... is that the way it has to be, or is there a more efficient way? I got my information from "The Little SAS Book."

And....is there anywhere else I can get information about using FIRST. and LAST.?

Thanks so much in advance.
9 REPLIES 9
Cynthia_sas
Diamond | Level 26
Hi...PROC SORT is a separate and unique procedure. PROC SQL, when you use ORDER BY and/or GROUP BY -- may need to do ordering of the data and will invoke whatever sorting process behind the scenes that it needs to run to create the file or the output in the desired order.

However, let's say that PROC SQL was not in the picture at all. You have some data, you want to work with that data and you know that the data is currently ordered by REGION, but you need the data ordered by YEAR. COULD you use PROC SQL to reorder the data? Certainly...but you could also just use PROC SORT to reorder the data:
[pre]
proc sort data=perm.salesdata;
by year region;
run;
[/pre]
would sort the data "in place" -- that means that by the time the sort was done, perm.salesdata would have been reorganized and would now be in order by year and then region within year.

But, if you had this:
[pre]
proc sort data=perm.salesdata out=work.sales_year;
by year region;
run;
[/pre]
then...the "original" file perm.salesdata would still be in the "original" order but the new (and temporary) file work.sales_year would be the same data, only ordered by year and then region within year.

In the above 2 instances, the BY statement is required by PROC SORT because you want to do ordering or reordering of the file.

You could write a DATA step program without using a BY statement, at all. In that case, SAS would not set any flags or automatic variables other than _N_, _ERROR_, etc. However, if you WANT to use FIRST.byvar and LAST.byvar processing then you have to "turn them on" with a BY statement inside your DATA step program. So the 2 BY statements in your code are really independent of each other. In your pseudocode, this may be easier to visualize if you pur "step boundary" statements in your code:
[pre]
Proc SQL;
create table step01
....
....
group by assigned job's end date;
Quit; <--- step boundary

Proc sort data = step01;
by assigned job's end date;
RUN; <---step boundary means that the ENTIRE step will execute BEFORE
the DATA step starts -- the BY statement here BELONGS to
PROC SORT.

Data finaloutput;
set step01;
by assigned job's end date;
if first assigned job's end date = 1;
run; <--- step boundary so the BY statement here BELONGS to the DATA step;

[/pre]

These topics in the SAS documentation (from the on-line help) will be of the most use to you:

Introduction to DATA Step Processing Topics:
How the DATA Step Works: A Basic Introduction
Supplying Information to Create a SAS Data Set

BY-Group Processing in the DATA Step Topics:
Understanding BY Groups
How the DATA Step Identifies BY Groups
Processing BY-Groups in the DATA Step


cynthia
deleted_user
Not applicable
Thank you Cynthia, for the information. Truly appreciated!
LawrenceHW
Quartz | Level 8
There are a few little ways to do this.
First you don't need the PROC SORT. The GROUP BY should have sorted the STEP01 dataset already for you. If it hasn't just bung an ORDER BY statement after the GROUP BY. Although I'm pretty sure the GROUP BY should do this for you (in my defence it's been a long day:( ).

To get the FIRST. & LAST. working inside a DATA step you need the BY statement. That's just the way it is. FIRST. & LAST. are explicitly defined temporary variables.

I assume your BY statement would look something like:
BY person end_date;

In the DATA step you would pick off:
IF FIRST.end_date;

You would then get the last job details for each person.

You could also do some slightly fancy SQL code using MAX functions and sub-selections and joins ... but to be honest the SQL followed by the DATA step is pretty much the best and most efficient (certainly from a code maintenance POV) way to achieve this.

Hope this makes sense.
Lawrence
deleted_user
Not applicable
Thanks for replying, Lawrence.... I understand completely. I just wanted some verification, and this helps a lot.
advoss
Quartz | Level 8
Lawrence's example will not get you the last record for an individual.

Since the data is sorted (using proc sort or an order by in proc sql) by individual and enddate (let's assume that it is ascending order for both columns), you only have to do:

data abc;
set xyz;
by individual ; /* you don't need to include enddate */
if last.individual; /* this will give you the individual's record with the most recent enddate */
run;
LawrenceHW
Quartz | Level 8
True. I meant to type LAST. and I typed FIRST.

What can I say? It was the end of pretty long day.

From a code maintenance and documentation pov I would always have the full by statement that I use. Saves inexperienced people coming up to you and asking why something works. Also means that when you go back to the code in a year or so, you don't have to work out what you did.
advoss
Quartz | Level 8
I understand your point about documentation and maintenance. My personal opinion is that a simple comment (which, of course, I would neglect to put in) would be as useful as the full by statement. But, as I said, that is just my opinion and certainly not worth spending useful time debating. From experience, I'm pretty sure that your viewpoint would triumph.
1162
Calcite | Level 5
How about using the HAVING statement in PROC SQL?

You could do this all in one step with something like

proc sql;
create table person, job
from step01
group by person
having enddate = max(enddate);
quit;
LawrenceHW
Quartz | Level 8
The HAVING clause is all very well. But how would you deal with character variables? I don't think the MAX (or MIN) functions work on character values.

I also think that FIRST. and LAST. are far more self-documenting.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3101 views
  • 0 likes
  • 5 in conversation