My data set have 4 variables: Identity, Year, Gender and Amount. Each person(Identity) can have more than one record.
First I sort them
BY YEAR IDENTITY GENDER AMOUNT;
There are 25 different years in the data set. In the next data step I divide the data into 25 separate sets, YEAR_1 - YEAR25, with each year in its own set. I do nothing that affects the ordering whatsoever.
Then I want to count the number of individuals in each year by using FIRST.IDENTITY.
So I put BY IDENTITY in the code like this:
/* Counting */
and so on, for each year.
I get the error message that the sets are not ordered by Identity.
Is it it really necessary to use PROC SORT on data sets YEAR_1 - YEAR_25 before the counting?
I haven't done anything that should have affected the RELATIVE individual ordering of persons(variable Identity).
Your SAS log output should have sufficient diagnostic information to tell you what observation was the problem -- that information typically helps me identify my programming oversight, working back from the DATA step that generated the error showing my data. No, you should not need to sort your file by YEAR, presuming that there is only one YEAR value -- suggest doing a PROC FREQ just prior to the failing DATA step to be sure. Good desk-checking and self-initiated diagnostics nearly always save the day and improve efficiency.
In addition to Scott's comments, I'll add a couple.
You can do a PROC CONTENTS on YEAR_1 to see how SAS thinks it is sorted.
If you broke the dataset up into pieces using PROC SQL, it could have re-arranged the data without your knowledge as part of its internal optimization. The only way to guarantee that SQL maintains an order is to use the ORDER BY clause in the SELECT statement. (This "feature" is why you can't implement the LAG functionality within SQL.)
OK, my mistake.
If you break the dataset by YEAR, the IDENTITY variable will maintain the ordering sequence (ordered), IF and ONLY IF the dataset is splitted sequentially, say processed through datastep (which processes each OBS sequentially). As Doc@Duke, said, you cannot assume the ordering sequence with PROC SQL, unless you explicitly say so. That is because SQL uses an optimizer that will choose the best strategy/technique to perform the task, and this may not involve a sequential processing of the dataset.
I agree with Scott, before making any further assumption about your task and the way you are doing it, you should share with us a bit of your code.
Every SAS data set has details held in a descriptor portion. This is a little header with details about the dataset. One of these details is the variable it is sorted by. When you run a proc sort SAS updates the sortedby value to reflect the variables the data has been sorted by.
When you run your SQL to break the data up the resulting data sets do not have the sortedby value set in the descriptor.
If you know for sure that the data is sorted in a particular way then you can set the sortedby value yourself as Geniz has pointed out, to save you running another proc sort.
If, when SAS comes to process the dataset, it finds the data is not actually sorted the way you say it is, it will throw an error.