BookmarkSubscribeRSS Feed
deleted_user
Not applicable
My data set have 4 variables: Identity, Year, Gender and Amount. Each person(Identity) can have more than one record.

First I sort them

BY YEAR IDENTITY GENDER AMOUNT;

There are 25 different years in the data set. In the next data step I divide the data into 25 separate sets, YEAR_1 - YEAR25, with each year in its own set. I do nothing that affects the ordering whatsoever.

Then I want to count the number of individuals in each year by using FIRST.IDENTITY.

So I put BY IDENTITY in the code like this:

DATA Counting1;
SET YEAR_1;
BY Identity
/* Counting */
RUN;

and so on, for each year.

I get the error message that the sets are not ordered by Identity.

Is it it really necessary to use PROC SORT on data sets YEAR_1 - YEAR_25 before the counting?

I haven't done anything that should have affected the RELATIVE individual ordering of persons(variable Identity).

Susan
10 REPLIES 10
DanielSantos
Barite | Level 11
Hello Susan.

Data sorted by YEAR IDENTITY GENDER AMOUNT is in different thant by IDENTITY.

For example

YEAR IDENTITY GENDER AMOUNT
2008 1 M 10
2008 2 F 11
2009 1 F 12
2009 2 F 13

is sorted by YEAR IDENTITY GENDER AMOUNT not sorted by IDENTITY.

Try to look the IDENTITY column alone, you can easily see that it is not sorted.
YEAR IDENTITY GENDER AMOUNT
. 1 . .
. 2 . .
. 1 . .
. 2 . .

At row 3 the sequence is broke, as 1 (row 3) is lower than 2 (row 2).

Greetings from Portugal.

Daniel Santos at www.cgd.pt
deleted_user
Not applicable
First I sorted the one and only original file BY YEAR IDENTITY GENDER AMOUNT;

If the years are 1976 - 2000 then the file should look like this

Year Id ....
1976 2
1976 4
1976 4
.
.
.

1977 1
1977 4
1977 6
.
.
.

1978 2
1978 3
1978 3
1978 4
1978 5
1978 6
.
.
.

Then I make 25 separate data sets for each year. The first three should start the same way as above and the other 22 sets in the same manner. I haven't changed the order.

I claim that the Identity column is ordered in each year set, and the counting is done separately in each year set.

I don't understand what you mean. Read my first message once more.

Susan
deleted_user
Not applicable
My 1:58 PM message was directed to Daniel Santos.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Suggest you share your SAS log output with source code revealed for further feedback/comment from *ANY* forum subscriber.

Scott Barry
SBBWorks, Inc.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Your SAS log output should have sufficient diagnostic information to tell you what observation was the problem -- that information typically helps me identify my programming oversight, working back from the DATA step that generated the error showing my data. No, you should not need to sort your file by YEAR, presuming that there is only one YEAR value -- suggest doing a PROC FREQ just prior to the failing DATA step to be sure. Good desk-checking and self-initiated diagnostics nearly always save the day and improve efficiency.

Scott Barry
SBBWorks, Inc.
Doc_Duke
Rhodochrosite | Level 12
Susan,

In addition to Scott's comments, I'll add a couple.

You can do a PROC CONTENTS on YEAR_1 to see how SAS thinks it is sorted.

If you broke the dataset up into pieces using PROC SQL, it could have re-arranged the data without your knowledge as part of its internal optimization. The only way to guarantee that SQL maintains an order is to use the ORDER BY clause in the SELECT statement. (This "feature" is why you can't implement the LAG functionality within SQL.)

Doc Muhlbaier
Duke
DanielSantos
Barite | Level 11
OK, my mistake.
If you break the dataset by YEAR, the IDENTITY variable will maintain the ordering sequence (ordered), IF and ONLY IF the dataset is splitted sequentially, say processed through datastep (which processes each OBS sequentially). As Doc@Duke, said, you cannot assume the ordering sequence with PROC SQL, unless you explicitly say so. That is because SQL uses an optimizer that will choose the best strategy/technique to perform the task, and this may not involve a sequential processing of the dataset.

I agree with Scott, before making any further assumption about your task and the way you are doing it, you should share with us a bit of your code.

Greetings from Portugal.

Daniel Santos at www.cgd.pt
GertNissen
Barite | Level 11
try

DATA Counting1;
SET YEAR_1(sortedby=Identity);
BY Identity
/* Counting */
RUN;

http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000131184.htm

Why do you want to make 25 datasets, if you just want to count the number of individuals in each year ? You can do that with just your initial dataset.
deleted_user
Not applicable
Spot on Geniz.

Every SAS data set has details held in a descriptor portion. This is a little header with details about the dataset. One of these details is the variable it is sorted by. When you run a proc sort SAS updates the sortedby value to reflect the variables the data has been sorted by.

When you run your SQL to break the data up the resulting data sets do not have the sortedby value set in the descriptor.

If you know for sure that the data is sorted in a particular way then you can set the sortedby value yourself as Geniz has pointed out, to save you running another proc sort.

If, when SAS comes to process the dataset, it finds the data is not actually sorted the way you say it is, it will throw an error.
LawrenceHW
Quartz | Level 8
Susan,

If you just want to count the number of unique ID per YEAR why not just use something like:

PROC SQL;
SELECT DISTINCT year
, COUNT(DISTINCT id) AS n_identifier
FROM alldata
GROUP BY year
;
QUIT;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1000 views
  • 0 likes
  • 6 in conversation