turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Necessary to repeat sorting?

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-15-2009 12:05 PM

My data set have 4 variables: Identity, Year, Gender and Amount. Each person(Identity) can have more than one record.

First I sort them

BY YEAR IDENTITY GENDER AMOUNT;

There are 25 different years in the data set. In the next data step I divide the data into 25 separate sets, YEAR_1 - YEAR25, with each year in its own set. I do nothing that affects the ordering whatsoever.

Then I want to count the number of individuals in each year by using FIRST.IDENTITY.

So I put BY IDENTITY in the code like this:

DATA Counting1;

SET YEAR_1;

BY Identity

/* Counting */

RUN;

and so on, for each year.

I get the error message that the sets are not ordered by Identity.

Is it it really necessary to use PROC SORT on data sets YEAR_1 - YEAR_25 before the counting?

I haven't done anything that should have affected the RELATIVE individual ordering of persons(variable Identity).

Susan

First I sort them

BY YEAR IDENTITY GENDER AMOUNT;

There are 25 different years in the data set. In the next data step I divide the data into 25 separate sets, YEAR_1 - YEAR25, with each year in its own set. I do nothing that affects the ordering whatsoever.

Then I want to count the number of individuals in each year by using FIRST.IDENTITY.

So I put BY IDENTITY in the code like this:

DATA Counting1;

SET YEAR_1;

BY Identity

/* Counting */

RUN;

and so on, for each year.

I get the error message that the sets are not ordered by Identity.

Is it it really necessary to use PROC SORT on data sets YEAR_1 - YEAR_25 before the counting?

I haven't done anything that should have affected the RELATIVE individual ordering of persons(variable Identity).

Susan

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

04-15-2009 12:16 PM

Hello Susan.

Data sorted by YEAR IDENTITY GENDER AMOUNT is in different thant by IDENTITY.

For example

YEAR IDENTITY GENDER AMOUNT

2008 1 M 10

2008 2 F 11

2009 1 F 12

2009 2 F 13

is sorted by YEAR IDENTITY GENDER AMOUNT not sorted by IDENTITY.

Try to look the IDENTITY column alone, you can easily see that it is not sorted.

YEAR IDENTITY GENDER AMOUNT

. 1 . .

. 2 . .

. 1 . .

. 2 . .

At row 3 the sequence is broke, as 1 (row 3) is lower than 2 (row 2).

Greetings from Portugal.

Daniel Santos at www.cgd.pt

Data sorted by YEAR IDENTITY GENDER AMOUNT is in different thant by IDENTITY.

For example

YEAR IDENTITY GENDER AMOUNT

2008 1 M 10

2008 2 F 11

2009 1 F 12

2009 2 F 13

is sorted by YEAR IDENTITY GENDER AMOUNT not sorted by IDENTITY.

Try to look the IDENTITY column alone, you can easily see that it is not sorted.

YEAR IDENTITY GENDER AMOUNT

. 1 . .

. 2 . .

. 1 . .

. 2 . .

At row 3 the sequence is broke, as 1 (row 3) is lower than 2 (row 2).

Greetings from Portugal.

Daniel Santos at www.cgd.pt

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to DanielSantos

04-15-2009 01:58 PM

First I sorted the one and only original file **BY YEAR IDENTITY GENDER AMOUNT;**

If the years are 1976 - 2000 then the file should look like this

Year Id ....

1976 2

1976 4

1976 4

.

.

.

1977 1

1977 4

1977 6

.

.

.

1978 2

1978 3

1978 3

1978 4

1978 5

1978 6

.

.

.

Then I make 25 separate data sets for each year. The first three should start the same way as above and the other 22 sets in the same manner. I haven't changed the order.

I claim that the Identity column is ordered in each year set, and**the counting is done separately in each year set.**

I don't understand what you mean. Read my first message once more.

Susan

If the years are 1976 - 2000 then the file should look like this

Year Id ....

1976 2

1976 4

1976 4

.

.

.

1977 1

1977 4

1977 6

.

.

.

1978 2

1978 3

1978 3

1978 4

1978 5

1978 6

.

.

.

Then I make 25 separate data sets for each year. The first three should start the same way as above and the other 22 sets in the same manner. I haven't changed the order.

I claim that the Identity column is ordered in each year set, and

I don't understand what you mean. Read my first message once more.

Susan

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

04-15-2009 02:03 PM

My 1:58 PM message was directed to Daniel Santos.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

04-15-2009 06:16 PM

Suggest you share your SAS log output with source code revealed for further feedback/comment from *ANY* forum subscriber.

Scott Barry

SBBWorks, Inc.

Scott Barry

SBBWorks, Inc.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

04-15-2009 01:42 PM

Your SAS log output should have sufficient diagnostic information to tell you what observation was the problem -- that information typically helps me identify my programming oversight, working back from the DATA step that generated the error showing my data. No, you should not need to sort your file by YEAR, presuming that there is only one YEAR value -- suggest doing a PROC FREQ just prior to the failing DATA step to be sure. Good desk-checking and self-initiated diagnostics nearly always save the day and improve efficiency.

Scott Barry

SBBWorks, Inc.

Scott Barry

SBBWorks, Inc.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

04-15-2009 02:02 PM

Susan,

In addition to Scott's comments, I'll add a couple.

You can do a PROC CONTENTS on YEAR_1 to see how SAS thinks it is sorted.

If you broke the dataset up into pieces using PROC SQL, it could have re-arranged the data without your knowledge as part of its internal optimization. The only way to guarantee that SQL maintains an order is to use the ORDER BY clause in the SELECT statement. (This "feature" is why you can't implement the LAG functionality within SQL.)

Doc Muhlbaier

Duke

In addition to Scott's comments, I'll add a couple.

You can do a PROC CONTENTS on YEAR_1 to see how SAS thinks it is sorted.

If you broke the dataset up into pieces using PROC SQL, it could have re-arranged the data without your knowledge as part of its internal optimization. The only way to guarantee that SQL maintains an order is to use the ORDER BY clause in the SELECT statement. (This "feature" is why you can't implement the LAG functionality within SQL.)

Doc Muhlbaier

Duke

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

04-16-2009 04:32 AM

OK, my mistake.

If you break the dataset by YEAR, the IDENTITY variable will maintain the ordering sequence (ordered), IF and ONLY IF the dataset is splitted sequentially, say processed through datastep (which processes each OBS sequentially). As Doc@Duke, said, you cannot assume the ordering sequence with PROC SQL, unless you explicitly say so. That is because SQL uses an optimizer that will choose the best strategy/technique to perform the task, and this may not involve a sequential processing of the dataset.

I agree with Scott, before making any further assumption about your task and the way you are doing it, you should share with us a bit of your code.

Greetings from Portugal.

Daniel Santos at www.cgd.pt

If you break the dataset by YEAR, the IDENTITY variable will maintain the ordering sequence (ordered), IF and ONLY IF the dataset is splitted sequentially, say processed through datastep (which processes each OBS sequentially). As Doc@Duke, said, you cannot assume the ordering sequence with PROC SQL, unless you explicitly say so. That is because SQL uses an optimizer that will choose the best strategy/technique to perform the task, and this may not involve a sequential processing of the dataset.

I agree with Scott, before making any further assumption about your task and the way you are doing it, you should share with us a bit of your code.

Greetings from Portugal.

Daniel Santos at www.cgd.pt

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

04-16-2009 05:24 AM

try

DATA Counting1;

SET YEAR_1(sortedby=Identity);

BY Identity

/* Counting */

RUN;

http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000131184.htm

Why do you want to make 25 datasets, if you just want to count the number of individuals in each year ? You can do that with just your initial dataset.

DATA Counting1;

SET YEAR_1(sortedby=Identity);

BY Identity

/* Counting */

RUN;

http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000131184.htm

Why do you want to make 25 datasets, if you just want to count the number of individuals in each year ? You can do that with just your initial dataset.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to GertNissen

04-17-2009 10:49 AM

Spot on Geniz.

Every SAS data set has details held in a descriptor portion. This is a little header with details about the dataset. One of these details is the variable it is sorted by. When you run a proc sort SAS updates the sortedby value to reflect the variables the data has been sorted by.

When you run your SQL to break the data up the resulting data sets do not have the sortedby value set in the descriptor.

If you know for sure that the data is sorted in a particular way then you can set the sortedby value yourself as Geniz has pointed out, to save you running another proc sort.

If, when SAS comes to process the dataset, it finds the data is not actually sorted the way you say it is, it will throw an error.

Every SAS data set has details held in a descriptor portion. This is a little header with details about the dataset. One of these details is the variable it is sorted by. When you run a proc sort SAS updates the sortedby value to reflect the variables the data has been sorted by.

When you run your SQL to break the data up the resulting data sets do not have the sortedby value set in the descriptor.

If you know for sure that the data is sorted in a particular way then you can set the sortedby value yourself as Geniz has pointed out, to save you running another proc sort.

If, when SAS comes to process the dataset, it finds the data is not actually sorted the way you say it is, it will throw an error.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

04-20-2009 10:58 AM

Susan,

If you just want to count the number of unique ID per YEAR why not just use something like:

PROC SQL;

SELECT DISTINCT year

, COUNT(DISTINCT id) AS n_identifier

FROM alldata

GROUP BY year

;

QUIT;

If you just want to count the number of unique ID per YEAR why not just use something like:

PROC SQL;

SELECT DISTINCT year

, COUNT(DISTINCT id) AS n_identifier

FROM alldata

GROUP BY year

;

QUIT;