turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Top and bottom 2 by city using a data step

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 12:14 PM - last edited on 04-06-2016 12:49 PM by Reeza

This post is part of the post "Need Urgent Help again!! Appreciate it."

**data** **ex**;

input City $ Region $ GDP;

cards;

City1 E 11.1

City2 E 12.5

City3 E 5.5

City4 A 8.5

City5 A 9.5

City6 A 10.5

;

**run**;

If want to get two largest gdp by city and two smallest gdp by city in one data set, how can I do it? Please advise me again. Thanks. I am a beginner to SAS and these problems takes me for 2 to 3 days to work it, but still I don't get it. I appreciate this forum and everyone's help.

Accepted Solutions

Solution

04-12-2016
10:26 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 12:58 PM

Hi Cathy,

Welcome to programming in SAS. Here's a couple of hints for getting the appropriate response.

Try and include a descriptive subject line for your problem. Saying you need immediate help isn't helpful for describing your problem. Everyone here has something that's an issue and that's why they're posting.

It is good to link to your previous question if you've already asked it somewhere and the solution doesn't work for you.

I've modified your title here.

Your question provides sample data, but it doesn't show what you want the output to look like so we are guessing a bit. It helps if you provide sample output and input

You want the top 2 and bottom 2 from your dataset.

The process is then:

A very simple way, and easily understood way is to:

Sort ascending, and take the first two as smallest.

Sort descending, and take the first two as largest

Combine datasets.

You can use the automatic variable _n_ as a pseudo row counter, and output explicitly tells SAS to keep those records.

```
data ex;
input City $ Region $ GDP;
cards;
City1 E 11.1
City2 E 12.5
City3 E 5.5
City4 A 8.5
City5 A 9.5
City6 A 10.5
;
run;
proc sort data=ex;
by gdp;
run;
data smallest;
set ex;
if _n_ <= 2 then output;
run;
proc sort data=ex;
by descending gdp;
run;
data largest;
set ex;
if _n_ <=2 then output;
run;
data want;
set smallest largest;
run;
```

A slightly more complex method is to use the NOBS variable after you sort it. This is an automatic variable that allows you to know the number of observations in a dataset. Then in one pass you can take the top 2 (if _N_ <=2) and bottom two, (if _n_ >= NOBS-1)

```
data ex;
input City $ Region $ GDP;
cards;
City1 E 11.1
City2 E 12.5
City3 E 5.5
City4 A 8.5
City5 A 9.5
City6 A 10.5
;
run;
proc sort data=ex;
by gdp;
run;
data want;
set ex nobs=count; *store the nobs into a variable called count;
if _n_ <= 2 then output;
if _n_ >= count-1 then output;
run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 12:28 PM

Many ways to do this. This is quick and dirty (two passes of the data). Secret is in the OUTOBS option:

```
data ex;
input City $ Region $ GDP;
cards;
City1 E 11.1
City2 E 12.5
City3 E 5.5
City4 A 8.5
City5 A 9.5
City6 A 10.5
;
run;
proc sql outobs=2;
create table largest2 as
select city, gdp
from ex
order by gdp desc;
create table smallest2 as
select city, gdp
from ex
order by gdp;
quit;
data all;
set largest2 smallest2;
run;
proc print data=largest2;
run;
proc print data=smallest2;
run;
proc print data=all;
run;
```

Hope this helps,

- Jan.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 12:32 PM - edited 04-06-2016 12:33 PM

anyothe ways not using proc sql? if so, please instruct me. Thanks a lot for your help, too.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 12:48 PM

Shure why not.

```
proc sort data=ex;
by gdp;
run;
data want;
if 0 then set ex nobs=n;
do i=1, 2, n-1, n;
set ex point=i;
output;
end;
stop;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 12:58 PM

This is good one, but why you need to have n-1 and I don't understand the codes. Thanks. Would you mind to explain me?

```
if 0 then set ex nobs=n;
do i=1, 2, n-1, n;
set ex point=i;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 01:18 PM

I don't mind to explain. Consider n is the number of rows in the dataset sorted by ascending gdp. So rows 1 and 2 are the cities with the lowest gdp. Rows n-1 and n are the cities with the highest gdp.

The point= options allows access by rownumber. The statement with nobs= stores the number of rows in variable n but never actually gets executed due to the "if 0". This approach is very efficient. If you don't count the sort.

Hope this explains my code.

Regards,

- Jan.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 01:27 PM

Thank you again. So if I have large data set, let's say I want 10 lowest and 10 largest, so I need to write from rows 1 to 10?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 01:29 PM

Cathy wrote:

You have a variety of solutions provided, in your response we can't see which one you're referring to.

In mine, you can change the 2 to 10 and _N_-1 to _n_ - 9 and you'll be fine.

The Proc Summary solution also scales well, change 2 to 10.

Not all of the others do

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 02:18 PM

@Cathy: Also, please note that some of the suggested solutions (including mine) select the two smallest and largest GDPs **in each region**, as this was specified in your earlier post, whereas others select the **overall** extreme values from the input dataset (maybe correctly, because you didn't write "in each region," but "by city" in the initial post of the current thread).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 03:16 PM

@FreelanceReinhard I want 2 smallest and largest only by city for this thread. yes, I needed to do by region for previous one. Thanks for clarification.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 03:21 PM

@Cathy: Thanks for the clarification. In this case, you can omit the NWAY option and CLASS statement in the PROC SUMMARY approach:

```
proc summary data=ex;
var GDP;
output out=want(drop=_:) idgrp(max(GDP) out[2] (GDP city)=highGDP city_highGDP)
idgrp(min(GDP) out[2] (GDP city)=lowGDP city_lowGDP);
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 03:13 PM

Thank you.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 12:52 PM

PROC SUMMARY/PROC MEANS (as suggested by Loko and KurtBremser in the other thread) allows for the "n largest/smallest" values as well:

```
proc summary data=ex nway;
class region;
var GDP;
output out=want(drop=_:) idgrp(max(GDP) out[2] (GDP city)=highGDP city_highGDP)
idgrp(min(GDP) out[2] (GDP city)=lowGDP city_lowGDP);
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 01:28 PM

Thank you very much for you input too. I appreciate it. I like your codes as well.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2016 12:56 PM

I would opt for @FreelanceReinhard's solution, but FWIW, here is a data step solution:

```
data ex;
input City $ Region $ GDP;
cards;
City1 E 11.1
City2 E 12.5
City3 E 5.5
City4 A 8.5
City5 A 9.5
City6 A 10.5
;
run;
proc sort data=ex;
by region gdp;
run;
data want;
do _s=1 by 1 until (last.region);
set ex;
by region gdp;
if _s=1 then
smallest_city=city;
if _s=2 then
second_smallest_city=city;
end;
do _l=1 by 1 until (last.region);
set ex;
by region gdp;
if _l=_s then
largest_city=city;
if _l=_s-1 then
second_largest_city=city;
end;
drop _: city gdp;
run;
```